PDA

View Full Version : R520 Infomania


Pages : 1 2 3 [4] 5 6

JoshMST
06-Apr-2005, 21:32
I thought we were talking about hardware decisions and not business decisions? Yes, I think everyone (and history) will agree with you that the direction that Greg B. took 3dfx was the worst possible one to take. Again, if they had put all their efforts into Banshee and released that in January with a 8 MB card, and then when TnT was about to hit update the SKU with a 16 MBversion, then they would probably still be around (oh, and if they had decided to not actually buy a struggling STB... that was a terrible decision).

But can you definitively point out that the Voodoo Graphics for the desktop market was totally backwards for the time?

Chalnoth
06-Apr-2005, 21:42
Well, I think the two are directly related. The hardware that they developed just wasn't suitable for the OEM space.

But can you definitively point out that the Voodoo Graphics for the desktop market was totally backwards for the time?
Don't really think it was.

_xxx_
06-Apr-2005, 23:04
So, I am not implying that what you are saying is totally unfounded, but the architectural decisions made for the consumer version of the Voodoo Graphics at that time were not so bad.
But the business decisions were bad. 3dfx pretty much utterly failed to make inroads into the OEM space, which is where the real money is.

With what chip? V2 needed a 2d card and banshee would have been too expensive for average Joe's mass-market PC IMHO.

EDIT:
to myself:"read all posts before being a parrot..."

psurge
07-Apr-2005, 00:41
Assuming a MIMD model, it makes no sense to have a global reservation station if threads are not dynamically reassigned to ALUs on context switches and doing this would require moving large amounts of state (the contents of all registers for the thread). On top of that a global reservation station for N alus would probably be O(N) times bigger than a single local station for the same amount of latency tolerance, and the arbiter would have to be able to pick multiple threads for issue per cycle. IMO it would be inefficient to begin with and wouldn't scale (in clockspeed or ALUs).

That said, I can see a global reservation station servicing the ALUs so long as they all execute the same instruction each cycle (a thread corresponds to N pixels or vertices running in lockstep). If this is the case, then texture fetch coherency would certainly be excellent, but branching would be far from optimal...

Chalnoth
07-Apr-2005, 02:38
Well, I suppose you could implement a system where the state is stored in a lookup table that has a limited amount of state to possibly store. That is to say, if the system is designed to work on 1000 in-flight instructions at a time, you could have N different instructions executed among those.

The problem, then, would be having too many possible branches in a limited area. Run out of state memory and the architecture will stall. I still do wonder how the system will work with latency hiding, though. Due to this it seems like it'd be rather challenging to get texture accesses to be efficient.

Unknown Soldier
07-Apr-2005, 08:02
With what chip? V2 needed a 2d card and banshee would have been too expensive for average Joe's mass-market PC IMHO.

Ye .. just like the R480 was too expensive .. and SLI Nvidia is too expensive. I bet if they did release it .. there would've been a market... up until the TNT was released. ;)

Funnily enough .. it seems that if there is a fast enough chip .. even at a price .. someone will buy it.

Did I mention I bought my X800 Pro for around $670 .. that's because it's the price that it's being sold here for.

US

Ailuros
07-Apr-2005, 08:33
I thought we were talking about hardware decisions and not business decisions? Yes, I think everyone (and history) will agree with you that the direction that Greg B. took 3dfx was the worst possible one to take. Again, if they had put all their efforts into Banshee and released that in January with a 8 MB card, and then when TnT was about to hit update the SKU with a 16 MBversion, then they would probably still be around (oh, and if they had decided to not actually buy a struggling STB... that was a terrible decision).

But can you definitively point out that the Voodoo Graphics for the desktop market was totally backwards for the time?

If that isn't way off topic LOL....anyway IMHO it's rather a bundle of various factors and yes I believe "luck" (as in favouring coincindences) and various business desicions play a role too. If their designs would had been anything but relatively backwards, then I wonder why they wasted resources developing for Sega's console and lost it almost last minute to Videologic/PowerVR.

We all know by now what a console design means in terms of resources and the usual shifts in an IHVs roadmaps; neither NVIDIA nor ATI went unaffected in the more recent past with Microsoft's according console deals. In fact roadmaps shifted for both IHVs when it comes to their next generation products of the time (NV30 and later "R400").

3dfx didn't exactly have what I call a plethora of engineers back then and developing for some time a console design and on top losing the contract to a competitor, was another negative factor that had quite a significant influence for the company back then overall (other factors of course included). Unless of course someone considers a V2-alike-design somewhat advanced for a console for that timeframe that is....

Ailuros
07-Apr-2005, 08:37
Assuming a MIMD model, it makes no sense to have a global reservation station if threads are not dynamically reassigned to ALUs on context switches and doing this would require moving large amounts of state (the contents of all registers for the thread). On top of that a global reservation station for N alus would probably be O(N) times bigger than a single local station for the same amount of latency tolerance, and the arbiter would have to be able to pick multiple threads for issue per cycle. IMO it would be inefficient to begin with and wouldn't scale (in clockspeed or ALUs).

That said, I can see a global reservation station servicing the ALUs so long as they all execute the same instruction each cycle (a thread corresponds to N pixels or vertices running in lockstep). If this is the case, then texture fetch coherency would certainly be excellent, but branching would be far from optimal...

Probably another dumb question, but how are possible "out of order" vertices getting handled in a MIMD scenario?

_xxx_
07-Apr-2005, 08:51
With what chip? V2 needed a 2d card and banshee would have been too expensive for average Joe's mass-market PC IMHO.

Ye .. just like the R480 was too expensive .. and SLI Nvidia is too expensive. I bet if they did release it .. there would've been a market... up until the TNT was released. ;)

Funnily enough .. it seems that if there is a fast enough chip .. even at a price .. someone will buy it.

Did I mention I bought my X800 Pro for around $670 .. that's because it's the price that it's being sold here for.

US

Yes, but the majority buys the low-end stuff thanks to the noise raised by the high-end. 3Dfx simply didn't have any low-end to accompany their Banshee, it's not like you could turn off a quad back then.

Chalnoth
07-Apr-2005, 09:10
Yes, but the majority buys the low-end stuff thanks to the noise raised by the high-end. 3Dfx simply didn't have any low-end to accompany their Banshee, it's not like you could turn off a quad back then.
They could have done what nVidia did, and shipped one with a 64-bit memory bus.

Unknown Soldier
07-Apr-2005, 11:56
Want to bet that the R520 will have 512MB 800Mhz DDR3 memory to work with? (http://www.beyond3d.com/forum/viewtopic.php?t=21845)

Man .. this is gonna rock with that memory.

US

_xxx_
07-Apr-2005, 12:40
Yes, but the majority buys the low-end stuff thanks to the noise raised by the high-end. 3Dfx simply didn't have any low-end to accompany their Banshee, it's not like you could turn off a quad back then.
They could have done what nVidia did, and shipped one with a 64-bit memory bus.

Wouldn't have made it significantly cheaper though, there was only one version of the chip and the memory used with it was rather cheap in comparison. No point in that from the business side of things. Banshee was already crippled to begin with, so they couldn't cripple the chip any more nor could they use defective chips for the low-end like nV did.

Geo
07-Apr-2005, 14:24
Want to bet that the R520 will have 512MB 800Mhz DDR3 memory to work with? (http://www.beyond3d.com/forum/viewtopic.php?t=21845)

Man .. this is gonna rock with that memory.

US

Anyone care to comment on whether this says anything, if true, about the number of pipes? Said another way, can a 16-pipe critter @~650mhz take full advantage of this --or would it imply more pipes?

Chalnoth
07-Apr-2005, 15:42
Wouldn't have made it significantly cheaper though, there was only one version of the chip and the memory used with it was rather cheap in comparison. No point in that from the business side of things. Banshee was already crippled to begin with, so they couldn't cripple the chip any more nor could they use defective chips for the low-end like nV did.
I don't think the TNT2 M64's were crippled chips. The main difference is cheaper packaging (fewer pins) and cheaper board costs (fewer layers on the board).

Geo
07-Apr-2005, 15:49
Anybody notice that ATI has three booths at E3 and NV has one? What's up with that? Is that one of those "Domino's orders at the Pentagon" indicators?

http://www.e3expo.com/exhibitors/exhibitor_list.asp

MuFu
07-Apr-2005, 16:13
Want to bet that the R520 will have 512MB 800Mhz DDR3 memory to work with? (http://www.beyond3d.com/forum/viewtopic.php?t=21845)

Man .. this is gonna rock with that memory.

US

Anyone care to comment on whether this says anything, if true, about the number of pipes? Said another way, can a 16-pipe critter @~650mhz take full advantage of this --or would it imply more pipes?

http://img119.exs.cx/img119/6319/untitled1copy6fh.png

phenix
07-Apr-2005, 17:08
http://img119.exs.cx/img119/6319/untitled1copy6fh.png


256bit memory interface is running out of gas I guess.

Chalnoth
07-Apr-2005, 17:19
Not really. Remember that we just aren't going to have much higher resolutions than we currently have, and FSAA no longer takes much memory bandwidth over normal rendering. So memory bandwidth demands are going to grow much more slowly than fillrate demands.

trinibwoy
07-Apr-2005, 17:28
Not really. Remember that we just aren't going to have much higher resolutions than we currently have, and FSAA no longer takes much memory bandwidth over normal rendering. So memory bandwidth demands are going to grow much more slowly than fillrate demands.

I'm not sure I understand the disconnect here. As shader tech gets faster won't it require faster access to memory as well? Won't longer and fancier shaders need faster access to texture memory and other buffers?

Geo
07-Apr-2005, 17:29
http://img119.exs.cx/img119/6319/untitled1copy6fh.png

Thanks, MuFu. Even at 800 instead of 700 that looks pretty grim for 24/32. Does make one wonder what NV has up their sleeve, since there appears to be near universal belief (we may be wrong, but we are not unsure :) ) that's where they are going.

Edit: The Baumann Uncertainty Principle

trinibwoy
07-Apr-2005, 17:33
Thanks, MuFu. Even at 800 instead of 700 that looks pretty grim for 24/32. Does make one wonder what NV has up their sleeve, since there appears to be universal belief (we may be wrong, but we are not unsure :) ) that's where they are going.

Maybe they're doing the XDR thing with a 24-piper....

Are there any fundamental reasons why a switch to XDR at this juncture is unfeasible? Is it just too expensive compared to GDDR3?

MuFu
07-Apr-2005, 17:33
Thanks, MuFu. Even at 800 instead of 700 that looks pretty grim for 24/32. Does make one wonder what NV has up their sleeve, since there appears to be universal belief (we may be wrong, but we are not unsure :) ) that's where they are going.

Well perhaps it's simply a case of them having a larger, lower clocked core (again).

Dave Baumann
07-Apr-2005, 17:35
Maybe there is very little need for a bunch of extra pixel fill-rate.

Geo
07-Apr-2005, 17:38
Thanks, MuFu. Even at 800 instead of 700 that looks pretty grim for 24/32. Does make one wonder what NV has up their sleeve, since there appears to be universal belief (we may be wrong, but we are not unsure :) ) that's where they are going.

Well perhaps it's simply a case of them having a larger, lower clocked core (again).

Yeah, I thot about that after I posted. 24 x 500 still is not exactly "well fed" on that chart. At 700mhz memory I make it 3.73 and 800mhz 4.26.

Geo
07-Apr-2005, 17:40
Maybe there is very little need for a bunch of extra pixel fill-rate.

Hmm! I think that's the first time I've seen you make that observation in an NV context. Previous post edited accordingly. :)

MuFu
07-Apr-2005, 17:49
Well there's always the 16 ROP/24 shader pipes scenario to consider.

Chalnoth
07-Apr-2005, 17:54
I'm not sure I understand the disconnect here. As shader tech gets faster won't it require faster access to memory as well? Won't longer and fancier shaders need faster access to texture memory and other buffers?
Sure, you'll have to read from more textures and stuff, but math ops are going to take over as the dominant factor in performance (for most applications). So you're still going to need more pure processing power than input/output bandwidth.

PeterAce
07-Apr-2005, 18:05
http://img119.exs.cx/img119/6319/untitled1copy6fh.png

256bit memory interface is running out of gas I guess.

Considering this :

http://www.beyond3d.com/forum/viewtopic.php?p=484944#484944

And assuming it's a more efficent mem bus (maybe the speculated 'token-ring-esk' style controller).

What impact would this have?

Dave Baumann
07-Apr-2005, 18:09
I'm not sure I understand the disconnect here. As shader tech gets faster won't it require faster access to memory as well? Won't longer and fancier shaders need faster access to texture memory and other buffers?

Two common things that most people in the industry agree on: Instructions per texture are increasing and instructions per pixel are increasing.

Geo
07-Apr-2005, 18:11
Well there's always the 16 ROP/24 shader pipes scenario to consider.

That makes my head hurt. Even after re-checking Wavey's 6600GT review. :wink:

I think its this para your pointing at, but I'm only marginally well parsing it:

Such an arrangement isn't as much of an issue these days as it used to be - with only one texture unit per fragment pipeline even Trilinear filtering takes 2 cycles, and as pixel shader lengths increase more and more cycles will be spent per pixel in the Pixel Shader element of the pipeline, so with a one-to-one mapping of Pixel Shader Pipelines to ROP outputs a lot of the time the ROP's are going to be spent idle anyway. Due to theses arrangements, NV40 has a very high peak theoretical fillrate but, due to the nature of the usage of modern pipelines, rarely will it get close to reaching it, whilst NV43 has a much lower peak fillrate but because of the processing capabilities in relation to output capabilities it is more likely to get closer to that peak in general use.

Joe DeFuria
07-Apr-2005, 18:21
Two common things that most people in the industry agree on: Instructions per texture are increasing and instructions per pixel are increasing.

I find it depressing that lots of people seem to be continually ignoring your hints such as the above, and (paraphrasing) "forget about the idea of a traditional pipeline...it's meaningless going forward."

It seems pretty obvious to me (based on yours and others hints), that we should not expect R520 to be a new pixel fill rate (pixel writes per cycle) behemoth relatively speaking. On the other hand "shader throughput per cycle" seems to be the fundamental change for next gen.

Geo
07-Apr-2005, 18:36
I find it depressing that lots of people seem to be continually ignoring your hints. . . .

It seems pretty obvious to me (based on yours and others hints), that we should not expect R520 to be a new pixel fill rate (pixel writes per cycle) behemoth relatively speaking. On the other hand "shader throughput per cycle" seems to be the fundamental change for next gen.

Oh, I dunno what context you're in on "lots of people". I think we get that pretty well on this thread, tho not everyone appears to have signed-up. Even those I don't think are ignoring --they think they know better. :) I do like to test new bits of info against the theory to see if it still holds up, but that's not ignoring either --that's scientific method.

What I found really interesting is the extension to NV of the generic observation. Personally I hadn't considered MuFu's point that they are already in their current gen better positioned to go that route than ATI is in their current gen, and the kind of scanty top-level leakage we get at this point isn't at a level of detail to be helpful in stirring those tea leaves re them either. So NV doesn't need a new architecture to do it, which makes it quite doable for the refresh. Possibly that was painfully obvious to everyone else all along, and I'll apologize for inflicting my epiphany on the world. :wink:

phenix
07-Apr-2005, 20:00
As far as I understand, you guys mean that the amount of pixel/vertex data read from the video memory will not increase all that much compared to the increase in the number of instructions per pixel/vertex. Which means that memory bandwith will have less and less effect on the performance of the future VPUs.
But arent we still long way before we completely ignore fillrate issues and concentrate on pure shader performances of 3D chips? Maybe we will not play with much higher resolutions in near future but we still cannot play with full resolution in modern games. Can we play 1600X1200 16XAAF/6XAA with respectable fps in far cry for example? Doesn't it mean that we still need more more fill rate (more pixels) before we completely want more shader power (better pixels)?
Besides, pixel resolution doesn't increase all that much but what about the texture resolutions in modern games? Don't you think they increase with insane speeds (HL2, Doom3, U3) which translates to need for more and more memory bandwith?

Joe DeFuria
07-Apr-2005, 20:15
But arent we still long way before we completely ignore fillrate issues and concentrate on pure shader performances of 3D chips?

Yes, I would agree with that.

Having said that, it's bandwidth (as always) that is the primary bottleneck to more fill rate. The overall decision to increase shader power faster than pixl fill rate power is driven at least (if not most significantly), by available memory bandwidth.

In other words, you're going to get more return on your invenstment in transistors going to shader power, than you would putting them toward pixel fill rate.

Don't you think they increase with insane speeds (HL2, Doom3, U3) which translates to need for more and more memory bandwith?

I think you're looking at it the wrong way. We ALWAYS need more bandwidth. ;) There is only so much that nVidia / ATI can do to "increase" bandwidth. (Compression, wider busses, etc.) Other than that, they are at the mercy of memory producers to provide the "bandwidth."

So it's more like "Given that we are going to expect to have X GB/S of available bandwidth...what is the best way to spend our transistor budget to make use of it? Throwing more pixel wirting power when you are already bandwidth constrained isn't going to get you much return.

This is why there's an (apparent) drive to "decouple" shader pipelines from pixel writing pipelines.

On a related note, it appears that the short-term answer to increase bandwidth / pixel writing power is SLI.

Hellbinder
07-Apr-2005, 20:23
Two common things that most people in the industry agree on: Instructions per texture are increasing and instructions per pixel are increasing.

I find it depressing that lots of people seem to be continually ignoring your hints such as the above, and (paraphrasing) "forget about the idea of a traditional pipeline...it's meaningless going forward."

It seems pretty obvious to me (based on yours and others hints), that we should not expect R520 to be a new pixel fill rate (pixel writes per cycle) behemoth relatively speaking. On the other hand "shader throughput per cycle" seems to be the fundamental change for next gen.

Exactly...

Still the bottom line that everyone will want to jaw about are the numbers the Thing kicks out in FPS.

(it wont be a slouch in pixel fill rate either, its simply not its primary target for performance which i have indicated before in other posts)

jvd
07-Apr-2005, 20:32
Well you figure if it stays at 16pipe lines and increases the amount of pixel work it can do and its shader power and only increases clock speeds for more fillrate they should be in fine shape .

figure on 90nm they should be able to hit 700mhz with the card . That be 11200 for the fill rate . That would easily drive half life 2 at 1600x1200 and keep 100fps

Chalnoth
07-Apr-2005, 20:41
Well you figure if it stays at 16pipe lines and increases the amount of pixel work it can do and its shader power and only increases clock speeds for more fillrate they should be in fine shape .
Er, no. Increasing shader power without increasing the number of pipelines is basically a waste, for the most part. Whenever you make one pipeline more powerful, it's less likely to make use of that additional power. It's much more efficient to have more pipelines, for the most part.

Joe DeFuria
07-Apr-2005, 21:10
Increasing shader power without increasing the number of pipelines is basically a waste, for the most part.

If you are shader, but not fill rate limited?

Whenever you make one pipeline more powerful, it's less likely to make use of that additional power. It's much more efficient to have more pipelines, for the most part.

I'm not sure what you're saying. In my experience, it's just the opposite. You tend to make use of the power that you have.

trinibwoy
07-Apr-2005, 21:33
If you are shader, but not fill rate limited?

This pretty much sums up my confusion on this matter. Since pixel-shading is currently part of the pipeline I find it difficult to make the distinction between shader and fill-rate limitations :oops: Can someone explain a bit?

Ostsol
07-Apr-2005, 22:00
I've generally thought that the term "fillrate" encompasses bandwidth, latency, and shader instruction throughput. . .

Chalnoth
07-Apr-2005, 22:01
I'll put it another way:
More shader power without more pipelines implies attempting to do more each clock cycle per pixel in-flight per clock. This is equivalent to increasing IPC in a CPU. The problem with this is that increasing shader power per pipeline falls victim to the same problems of increasing IPC in a CPU: it costs a hell of a lot of transistors to do it.

So, since GPU's are so easily parallelizable, it often makes much more sense just to have more pipelines (though there's obviously some happy medium somewhere: more pipelines means you need to store more data inside the GPU, so there is some optimal amount of IPC, which may be different with different tasks or different ways of generating higher IPC).

Now, shader pipelines don't have to be equated to the ability to read textures or output pixels. Shader pipelines are just a measure of the number of pixels that the architecture works on in parallel each clock cycle. The GeForce 6600, for instance, has 8 pixel shader pipelines, but can only output four pixels each clock cycle.

ninelven
07-Apr-2005, 22:38
Chalnoth, nm... semantics

Geo
07-Apr-2005, 22:41
Having said that, it's bandwidth (as always) that is the primary bottleneck to more fill rate. The overall decision to increase shader power faster than pixl fill rate power is driven at least (if not most significantly), by available memory bandwidth.

<snip>

I think you're looking at it the wrong way. We ALWAYS need more bandwidth. ;) There is only so much that nVidia / ATI can do to "increase" bandwidth. (Compression, wider busses, etc.) Other than that, they are at the mercy of memory producers to provide the "bandwidth."

So it's more like "Given that we are going to expect to have X GB/S of available bandwidth...what is the best way to spend our transistor budget to make use of it? Throwing more pixel wirting power when you are already bandwidth constrained isn't going to get you much return.

This is why there's an (apparent) drive to "decouple" shader pipelines from pixel writing pipelines.

Well, way back here: http://www.beyond3d.com/forum/viewtopic.php?t=21341&postdays=0&postorder=asc&sta rt=425

we have Ailuros' graph from GPU Gems on NV's view of the world. Of course, when you are doing a 10 year graph it is going to smooth out the dips and peaks in the multiple progression lines and affect the short-term relationships of those lines. As Ailuros noted, this graph would suggest NV sees transistor count scaling in close synch with bandwidth. I've always thot of NV as bleeding-edge bandwidth demons, and leaning more towards lining up their high-end releases to their bandwidth needs availability. Part of what startled me with MuFu's suggestion and Dave's observation is that they really are positioned now to do it the other direction for short term practical (read "need a competitive part we can sell right now") reasons.

Pete
07-Apr-2005, 22:48
But Chal, increasing IPC still costs less in terms of transistors than adding entire pipes, ROPs and all, no? It seems that eventually ATI will want to decouple "pipes" and ROPs, as nV has done, for efficiency's sake. 300M is a ton of transistors, but there's still not much sense in wasting space with underused hardware. It appears that, as nV demonstrated with the very capable NV43, ROPs will take a relative back seat to shader ops.

trinibwoy, I think people are separating "old-school" fillrate (ROPs) from shader "fillrate" (tho IPC seems a more accurate term than fillrate). Eh, as GPUs become more sophisticated, so will the terminology surrounding them. Not much different than what happened b/w Athlons and Pentiums, IMO.

Chalnoth
07-Apr-2005, 23:21
But Chal, increasing IPC still costs less in terms of transistors than adding entire pipes, ROPs and all, no?
Up to a point. This is why I stated that there's some amount of IPC that is optimal. And ROPs need not be connected to pipelines at all.

It appears that, as nV demonstrated with the very capable NV43, ROPs will take a relative back seat to shader ops.
Yes, but the NV43 still has 8 separate pixel pipelines. The NV43 isn't an example of attempting to increase IPC over its more expensive brethren, as its pixel pipelines are pretty much identical to the NV40's. There are just fewer of them, and a different number of (completely separate) ROP's.

Chalnoth
07-Apr-2005, 23:31
Well, way back here: http://www.beyond3d.com/forum/viewtopic.php?t=21341&postdays=0&postorder=asc&sta rt=425

we have Ailuros' graph from GPU Gems on NV's view of the world. Of course, when you are doing a 10 year graph it is going to smooth out the dips and peaks in the multiple progression lines and affect the short-term relationships of those lines. As Ailuros noted, this graph would suggest NV sees transistor count scaling in close synch with bandwidth. I've always thot of NV as bleeding-edge bandwidth demons, and leaning more towards lining up their high-end releases to their bandwidth needs availability. Part of what startled me with MuFu's suggestion and Dave's observation is that they really are positioned now to do it the other direction for short term practical (read "need a competitive part we can sell right now") reasons.
Well, I just have to say that I personally find at least one aspect of that graph highly unlikely. It assumes a doubling in the number of transistors every three years. I just don't think that transistor counts are going to follow such a simple exponential rise over the next few years, since as densities get higher and higher, there are going to be more and more roadblocks preventing further increases in density and overall performance.

Edit:
Notice that though transistor counts may scale with bandwidth, clock speed is also predicted to increase, so that graph does indeed predict fillrate outstripping bandwidth.

trinibwoy
08-Apr-2005, 01:40
trinibwoy, I think people are separating "old-school" fillrate (ROPs) from shader "fillrate" (tho IPC seems a more accurate term than fillrate). Eh, as GPUs become more sophisticated, so will the terminology surrounding them. Not much different than what happened b/w Athlons and Pentiums, IMO.

Ok, I think I get it now. I had it a couple pages ago then I lost it again after a couple posts :)

Regarding IPC I tend to think NV40 has an advantage since they are competing with lower clocks. But in another thread it was demonstrated that R420 does better with using math ops to mask texture fetch latencies. It seems that the more complex these parts get the less we know about their inner workings :?

Joe DeFuria
08-Apr-2005, 02:00
I'll put it another way:
More shader power without more pipelines implies attempting to do more each clock cycle per pixel in-flight per clock.

Right.

This is equivalent to increasing IPC in a CPU.

Sort of, yes.

The problem with this is that increasing shader power per pipeline falls victim to the same problems of increasing IPC in a CPU: it costs a hell of a lot of transistors to do it.

But that's exactly what they've been doing for the past 20 years or so, isn't it? I think the prescott is one of the first processors (x86) to scale back IPC in favor of clock speed.

But in either case, they are both still working within the "confine" of memory bandwidth. (Hence, more and more cache to hide latencies and such).

So, since GPU's are so easily parallelizable, it often makes much more sense just to have more pipelines (though there's obviously some happy medium somewhere: more pipelines means you need to store more data inside the GPU, so there is some optimal amount of IPC, which may be different with different tasks or different ways of generating higher IPC).

Yes, I agree that given an amount of bandwidth, there is an optimal (in a generic sense) IPC level. This is highly dependent on the application, and with shaders gaining more and more prevalence, I see the "optimal" IPC being increasing higher than it is with today's GPUs.

Fodder
08-Apr-2005, 03:20
I think the prescott is one of the first processors (x86) to scale back IPC in favor of clock speed.
That was the design philosophy of the whole Netburst/P4 architecture.

Chalnoth
08-Apr-2005, 03:54
But that's exactly what they've been doing for the past 20 years or so, isn't it? I think the prescott is one of the first processors (x86) to scale back IPC in favor of clock speed.
Well, that's a pretty stupid thing to do, though, because maximum clockspeed is limited by the physics. The fastest of the P4's are surely feeling this limit now. Higher clock speed takes more transistors, too.

Yes, I agree that given an amount of bandwidth, there is an optimal (in a generic sense) IPC level. This is highly dependent on the application, and with shaders gaining more and more prevalence, I see the "optimal" IPC being increasing higher than it is with today's GPUs.
Um, bandwidth, no. I was talking more in terms of transistor count. More pipelines means more cache and functional units are required. Higher IPC means you're wasting functional units, but cache requirements remain about the same.

"Optimal" IPC will decrease instead of increase in the future, because the more general purpose nature of GPU's will require them to be good for more and more disparate tasks. If they only had to run one specific fragment/vertex program, there would be no problem: you could bump IPC as high as you wanted without issue. But since things are getting more general, it's going to be harder for IHV's to predict what game developers are going to want to do, and thus "IPC" should come down (or, at least, not increase) in favor of more pipelines.

t0y
08-Apr-2005, 06:51
I don't think I'm up to par with most of the contributers of this long thread but I'll post my opinion anyway... :D

With the advent of dynamic branching it's only normal that the tradional paralelization of pixel shaders will be replaced by paralelization of shader instructions. The only remnants of pipelines will just be the state of each of the individual pixels whose shader instructions are sharing the ALUs, texture units, etc... The scheduler will be the most important part of the GPU due the importance of assigning the multiple resources optimally and issuing adequate prefetch data requests to hide latency.

This is the only way to take full advantage of the hardware while, at the same time, having shaders branching "randomly" with impossible to predict behaviour at compile-time.

OTOH, I don't believe this new generation will bring all of this, maybe some kind of compromise. It's not like dynamic branching is ready for prime time with all the fixed-pipeline code still in use, but I really can't see what kind of arrangement they'll come up with and still increase current games' performance. Maybe (hopefully) they will surprise us and the hints from dave suddenly make sense. ;)

I can only imagine the waste of processing power on an NV40 with dynamic code these days. The pipeline has to be broken sooner or later...
Don't you guys agree?

Ailuros
08-Apr-2005, 10:27
Well, way back here: http://www.beyond3d.com/forum/viewtopic.php?t=21341&postdays=0&postorder=asc&sta rt=425

we have Ailuros' graph from GPU Gems on NV's view of the world. Of course, when you are doing a 10 year graph it is going to smooth out the dips and peaks in the multiple progression lines and affect the short-term relationships of those lines. As Ailuros noted, this graph would suggest NV sees transistor count scaling in close synch with bandwidth. I've always thot of NV as bleeding-edge bandwidth demons, and leaning more towards lining up their high-end releases to their bandwidth needs availability. Part of what startled me with MuFu's suggestion and Dave's observation is that they really are positioned now to do it the other direction for short term practical (read "need a competitive part we can sell right now") reasons.
Well, I just have to say that I personally find at least one aspect of that graph highly unlikely. It assumes a doubling in the number of transistors every three years. I just don't think that transistor counts are going to follow such a simple exponential rise over the next few years, since as densities get higher and higher, there are going to be more and more roadblocks preventing further increases in density and overall performance.

Edit:
Notice that though transistor counts may scale with bandwidth, clock speed is also predicted to increase, so that graph does indeed predict fillrate outstripping bandwidth.

If you look a bit closer on the graph and the closer future, you might notice that that particular estimate is damn close for the short term to Mufu's relation chart a couple of pages ago.

With current GDDR3 you might get close to 45GB/sec for roughly 10GPixels/sec and with the appearance of GDDR4 anything above 50GB/sec (if not quite a lot more) for 10-12GPixels/sec.

As far as the doubling of transistors every 3 years concerns, I'd love to see a chart where it hasn't been the case this far for the past years. NV30 wasn't 3 years apart from NV40 and yet I could see transistor count scaling from 120M to 222M. How many transistors do you really estimate would any NV4x refresh exactly have or more specifically the first 90nm based release in late 2005?

Finally clockspeed increases in that estimate as conservatively as up to now. That's an estimate of 4x times the clockspeed in a decade and a bit less than 10x times of today's bandwidth. I've no idea how quads are going to scale in a decade, but keeping the 10GPixels to 44GB/sec relation, with ~60GPixels the estimated bandwidth with the same factor is ~260GB/sec.

....so that graph does indeed predict fillrate outstripping bandwidth...

There are too many details missing to even come to such conclusions. If that estimate would be with maximum 8 quads in mind the fillrate to bandwidth relation doesn't change a bit compared to today's ultra high end accelerators. Bandwidth has been scaling on GPUs as expected in the recent past and it would be probably even higher if higher speced memory would be available in higher capacities, especially for R420/480.

Ailuros
08-Apr-2005, 10:44
Well you figure if it stays at 16pipe lines and increases the amount of pixel work it can do and its shader power and only increases clock speeds for more fillrate they should be in fine shape .
Er, no. Increasing shader power without increasing the number of pipelines is basically a waste, for the most part. Whenever you make one pipeline more powerful, it's less likely to make use of that additional power. It's much more efficient to have more pipelines, for the most part.

Based on what data exactly? I could see a 3 quad@475MHz X800PRO being more or less on the same performance level than a 4 quad@350MHz 6800GT.

I don't see much of a difference theoretically whether 4 quads@600MHz, or 6 quads@400MHz; maybe just maybe if such scenarios have anything to do with real upcoming GPUs, I wouldn't even exclude the latter to have 2 VS engines more in order to compensate for the higher geometry throughput (think =/>650MHz * 8 vs. let's say =/>400MHz * 10).

If yes then I'd also like to see whether NV's first 90nm part in late 2005, will have 4 or more quads in the end.

By the way I was living under the impression that the more units on board, the longer it takes for them to communicate from one end to the other.

Jawed
08-Apr-2005, 12:38
As you increase quads, each quad is getting less texturing memory bandwidth, if overall memory bandwidth remains unchanged.

If texturing 16 pixels at 450MHz in NVidia's architecture uses all of 600MHz memory's bandwidth (for the sake of argument), two extra quads'-worth of pixels (i.e. another 8 pixels) are going to stall waiting for texturing at the same bandwidth.

Two extra quads effectively requires a 50% increase in texturing bandwidth. What proportion of the overall VPU memory bandwidth that amounts to isn't clear to me...

Pretty much the same argument applies when increasing core speed. If you increase core speed by 20% then you need 20% extra texturing memory bandwidth or you'll get texturing stalls (for the same piece of code).

Obviously it's always possible to create code that will stall pixel pipelines due to texture fetches. Indeed we seem to have a thread on that very subject right now...

But you can't build an un-balanced architecture with extra pixel pipelines which can't be supported by texturing bandwidth. You'll simply increase the chances of stalling a pipeline.

Virtual memory appears to be the next big win for texturing bandwidth. Until that appears, it seems to me we're stuck with 4 quads.

Of course if you're unable to make 90nm run fast, because low-k is something you've never done before, then maybe you've got no choice but to go with 5 or 6 quads.

Jawed

Chalnoth
08-Apr-2005, 12:43
As you increase quads, each quad is getting less texturing memory bandwidth, if overall memory bandwidth remains unchanged.
1. Overall memory bandwidth will still be increasing, though not by as much.
2. Higher ALU/texture op ratios in the future will help to alleviate this.

Jawed
08-Apr-2005, 13:01
Well there's plenty of talk of 700MHz memory this summer and it seems 800Mhz memory by the end of the year - actually being used, not just demonstrated. GDDR3 is reckoned to be good for 1Ghz.

Higher ALU/texture op ratios is a given. That's the noise Huddy was making at GDC. He's saying "DO MORE MATH".

Average best case is 2 ALU ops per clock in X800 hardware. Absolute best case is 5 ALU ops per cycle. Clearly that's down to instruction dependencies in shader code.

Jawed

Chalnoth
08-Apr-2005, 13:04
Average usage is much closer to 1 ALU op, in any current hardware.

Rys
08-Apr-2005, 14:00
Well there's plenty of talk of 700MHz memory this summer and it seems 800Mhz memory by the end of the year - actually being used, not just demonstrated. GDDR3 is reckoned to be good for 1Ghz.

Higher ALU/texture op ratios is a given. That's the noise Huddy was making at GDC. He's saying "DO MORE MATH".

Average best case is 2 ALU ops per clock in X800 hardware. Absolute best case is 5 ALU ops per cycle. Clearly that's down to instruction dependencies in shader code.

Jawed

I think he was just talking about the vertex shader, and dual issuing vertex ops hardly happens (whereas you can dual issue more easily in fragment programs).

1 ALU op per clock seems about right, at least for the vertex hardware.

ferro
08-Apr-2005, 14:04
Instead of increasing memory bandwidth, one can also decrease memory bandwidth requirements. In the CPU world, performance does not depend much on memory bandwidth, because of the effectiveness of the 2nd level cache. What if a GPU has a 2MB cache on-chip that is used for the memory that consumes the most bandwidth (frame buffer + z buffer?)? What if the entire frame buffer is on-chip? It might explain the following Dave Baumann hint:


Well, probably more a fragment switch than a crossbar, but still new logic compared to R420. Still a crossbar for the memory controller, though.

Not necessarily.

A crossbar memory controller adds a lot of complexity (duplication of the address bus) to make memory access more efficient. An effective cache or on-chip frame buffer might remove the need for a crossbar memory controller, and the need for more external memory bandwidth.

If NV4x and R4XX already have such a cache, this is all nonsense of course.

digitalwanderer
08-Apr-2005, 15:37
Ok, I heard a rumor but it's a strong rumor from a reliable source:

it's going to be poinsettia red leaning a tad towards rosepetal. http://mastabeta.com/forum/images/smilies/yep.gif

Xmas
08-Apr-2005, 16:08
A crossbar memory controller adds a lot of complexity (duplication of the address bus) to make memory access more efficient. An effective cache or on-chip frame buffer might remove the need for a crossbar memory controller, and the need for more external memory bandwidth.

If NV4x and R4XX already have such a cache, this is all nonsense of course.
I think you still need a split memory interface. A cache can't completely remove the inefficiencies due to a wide memory interface.
GDDR3 has a burst length of 4, IIRC, so with a single wide 256-bit interface the minimum data size per access is 1024 bits.
Assuming frame buffer tile size is 4x4, and compression only removes the need to store identical samples with AA, such a tile takes 16*32 = 512 bits. So every time you access one tile but not the next, you waste half of the data read. And this gets much worse with texture reads, as a single DXT1 tile is 64 bits only.
Page breaks are a problem, too.

A split memory interface has its inefficiencies, too, like if you only access data from one channel. That's why you have to put a lot of thought in how you distribute different data sets (texture, vertex, color and Z buffer) across different channels, whether to use striping or not, etc.

Jawed
09-Apr-2005, 11:18
Average usage is much closer to 1 ALU op, in any current hardware.

http://www.beyond3d.com/forum/viewtopic.php?p=327176#327176

In long shaders you're going to get around 2 ops per cycle.

Lots of fun things in that thread :lol:, e.g.

There are things happening at Crytek that you guys don't know (but of which you would jump up and down screaming if you do know).
Jawed

Ailuros
09-Apr-2005, 11:37
As you increase quads, each quad is getting less texturing memory bandwidth, if overall memory bandwidth remains unchanged.
1. Overall memory bandwidth will still be increasing, though not by as much.
2. Higher ALU/texture op ratios in the future will help to alleviate this.

1. I doubt quads will scale with the same rhythm for the next decade, as they did for the past two generations. Is anyone expecting that the amount of quads is going to double every generation or every 2-3 years?

2. No objection concerning the ALU/texture op ratios, yet that still won't mean that we'll have beyond 70GPixels/sec fill-rate with just 100GB/sec bandwidth.

Virtual memory appears to be the next big win for texturing bandwidth. Until that appears, it seems to me we're stuck with 4 quads.

Is it really that impossible that we'll see early attempts prior to WGF2.0 GPUs? It looks like they might be something like a year apart from R5xx or whatever else.

Of course if you're unable to make 90nm run fast, because low-k is something you've never done before, then maybe you've got no choice but to go with 5 or 6 quads.

I doubt that's the real reason; if there's going to be a 110nm refresh, switching from 130 to 110nm (TSMC) sounds a lot easier to me.

Jawed
09-Apr-2005, 12:13
Virtual memory appears to be the next big win for texturing bandwidth. Until that appears, it seems to me we're stuck with 4 quads.

Is it really that impossible that we'll see early attempts prior to WGF2.0 GPUs? It looks like they might be something like a year apart from R5xx or whatever else.

Well I'd like to think that virtual memory is intrinsic to R500. Who knows, eh?

Of course if you're unable to make 90nm run fast, because low-k is something you've never done before, then maybe you've got no choice but to go with 5 or 6 quads.

I doubt that's the real reason; if there's going to be a 110nm refresh, switching from 130 to 110nm (TSMC) sounds a lot easier to me.

But there's noise about NVidia going to 90nm this year. Summer or winter, who knows? That's all I'm thinking of.

ATI got burnt going from low-k to non-low-k with the 110nm X700 cards. NVidia certainly gets better clocks on this process.

Jawed

Chalnoth
09-Apr-2005, 13:15
Average usage is much closer to 1 ALU op, in any current hardware.

http://www.beyond3d.com/forum/viewtopic.php?p=327176#327176

In long shaders you're going to get around 2 ops per cycle.
Um, where in that thread did you get this information?

nAo
09-Apr-2005, 13:41
Um, where in that thread did you get this information?
There's a 100+ instrucions shader that can be executed in less than 50 cycles according nvidia's compiler.

Jawed
09-Apr-2005, 14:10
Look at the asm code generated, and compare the different SM targets and precisions. Anywhere from 47 to 65 cycles depending, on NV40. That's for about 103 instructions. Though about 13 of those instructions are declarations and I'm not sure how they consume cycles.

Forced FP32 is causing that particular shader to run 37% slower than the default compile on NV40. Ouch.

NV35 is running the same shader in 69 to 181 cycles, depending. Double-ouch. Erm, actually, make that triple-ouch.

Jawed

nAo
09-Apr-2005, 14:18
That's for about 103 instructions. Though about 13 of those instructions are declarations and I'm not sure how they consume cycles.
Declarations don't (directly) consum cycles and that shader is 103 instructions long, declarations not included.

Geo
09-Apr-2005, 14:40
Okay, so why is Dave's sig now about London, "The Littlest Hobo"? Deconstruct, please. Tired of having his sig deconstructed, perhaps? :lol:

Ohhh. I think I get it. A different message this time. London isn't anyone's pet, and repeatedly refuses to be. Never mind. Carry on.

Ailuros
09-Apr-2005, 18:11
Well I'd like to think that virtual memory is intrinsic to R500. Who knows, eh?

No idea. Just trying to keep an open mind until announcement. The recent speculative digit-life write up got me thinking...


But there's noise about NVidia going to 90nm this year. Summer or winter, who knows? That's all I'm thinking of.

www.beyond3d.com

Michael Hara:

“Well, from an architecture standpoint we’re just still at the beginning of shader model 3.0. And we need to give the programmers out there some time to continue to really learn about that architecture. So in the spring refresh what you’ll see is a little bit faster versions...

...I think you’ll see the industry move up a little bit in performance. But I don’t think you’ll see any radical changes in architecture. I doubt you’ll see any radical changes in architecture even in the fall. When we came out with GeForce 6, we tend to create a revolutionary architecture about every actually two years. And then we derive from it for the following time. So even the devices that we announced this fall, that will be I think a lot more powerful than the ones we actually had a year ago. Architecturally we’re still in the shader model three type era.”

“If you look at when we go to 90, my guess will be is we’ll have one or two products this year going from 90 in the second half”.


ATI got burnt going from low-k to non-low-k with the 110nm X700 cards. NVidia certainly gets better clocks on this process.

I doubt there's going to be much difference in clockspeed in a hypothetical 110nm/6 quad case scenario compared to NV40. The NV42 mobile part (3 quads/110nm) is clocked currently at 450MHz. NV43 (2 quads) = 500MHz.

Jawed
11-Apr-2005, 00:49
Since I did one of these "ALU Transistor Guesstimates" for the R500, I thought I'd do one for R520, just cos I can.

http://www.cupidity.f9.co.uk/b3d03.jpg

I'm guessing the major difference with R520 is that subject to register dependency you'll always get a double-issue across the pairs of vector and/or scalar ALUs in the pixel shader. I only say that because I have a feeling that a double-issue of vector ops isn't possible in R420. But, ahem, I'm not clear on that.

So I'm guessing that the end result is that R520 will do more vector ops per cycle than R420, on average. This might be the only reason why R520 is, on average, faster per clock than R420.

The "transistor multiplier" column is just a rough guesstimate for the count of transistors that are required in the ALUs. It assumes that there's no difference in transistor count for an ALU regardless of shader model level. Which is plainly crap. So, just a starting point...

Jawed

Geo
11-Apr-2005, 01:57
So mebbee 25% for clock and 25% for efficiencies relative to X800?

Chalnoth
11-Apr-2005, 02:50
25% for efficiencies would be quite a lot. Today's architectures are already highly efficient.

ninelven
11-Apr-2005, 03:12
Depends on how you look at it...

Chalnoth
11-Apr-2005, 04:05
You could always gain even more efficiency through the use of more efficient shader models, but that requires programming for the new architecture. Personally, I'd rather only (currently) discuss efficiency in the following two ways:
1. Performance efficiency in current games.
2. Feature efficiency vs. other similar architectures.

Geo
11-Apr-2005, 04:48
25% for efficiencies would be quite a lot. Today's architectures are already highly efficient.

What's the per clock efficiency advantage of the 6800 ultra vs the 800xtpe?

This would be a rhetorical question. :lol:

Chalnoth
11-Apr-2005, 06:10
That's no longer interesting, anyway, because we're approaching a new generation. When the next gen comes up, then it will be an interesting question again, because then we'll have more food for speculation on future iterations of the respective architectures.

Geo
11-Apr-2005, 06:24
That's no longer interesting, anyway, because we're approaching a new generation. When the next gen comes up, then it will be an interesting question again, because then we'll have more food for speculation on future iterations of the respective architectures.

Gee, Chal, I'm right so rarely yet you still have to sidestep? :)

Switching subjects, anyone care to back away from the 300-350M transistor range for R520? Wavey has already disclaimed allegiance to it explicitly (without using the number, but still).

Given the Goldman analyst "smaller die" prediction, and some interesting-yet-now-missing speculation upstream re R580, anyone leaning more 250-300Mish for R520? Anyone think it will be *less* than 250M?

Chalnoth
11-Apr-2005, 06:40
I'm not trying to sidestep. I just don't care anymore.

ninelven
11-Apr-2005, 07:13
*cough*BS*cough*

Tim Murray
11-Apr-2005, 07:27
Given the Goldman analyst "smaller die" prediction, and some interesting-yet-now-missing speculation upstream re R580, anyone leaning more 250-300Mish for R520? Anyone think it will be *less* than 250M?
Well yeah, if they clock the living shit out of it. That would almost certainly be a 16 pipeline part.

hovz
11-Apr-2005, 08:10
ill go ahead and say 225ish. i dont see ati rly pushing any boundaries transistor wise or performance wise when they are going to a new process and nvidia isnt launching anything new.

Rys
11-Apr-2005, 09:07
I'm guessing the major difference with R520 is that subject to register dependency you'll always get a double-issue across the pairs of vector and/or scalar ALUs in the pixel shader. I only say that because I have a feeling that a double-issue of vector ops isn't possible in R420. But, ahem, I'm not clear on that.

You certainly can dual-issue in the vertex hardware in R420, it's just not that common. Basically you should assume the shader compiler for vertex programs is going to suck, relatively speaking, compared to the output it'll produce for fragment programs.

Transistor wise, just under 300M is my guess.

Dave Baumann
11-Apr-2005, 09:16
Dual issue or Co-issue? ;)

Geo
11-Apr-2005, 09:33
Transistor wise, just under 300M is my guess.

Yeah, that's where I am 2. So to speak. :lol:

Ailuros
11-Apr-2005, 09:36
Given the Goldman analyst "smaller die" prediction, and some interesting-yet-now-missing speculation upstream re R580, anyone leaning more 250-300Mish for R520? Anyone think it will be *less* than 250M?

geo,

There are always bits of information that leak through; it depends what each bit of information concerns and if it won't cause for confusion while being channelled through who knows what half-way ignorant receivers.

Assume R500/Xenon being a 4 quad/unified architecture with ~10MB eDRAM; how do you figure would the transistor count look like on that one, especially considering the amount of transistors the embedded memory might consume?

The initial clockspeed estimates for the Xenon GPU where around 500+MHz; no idea of course if it'll turn out at that rate or even higher. Albeit transistor count isn't necessarily directly connected to core frequency, I'm having rather a 650-700MHz clockspeed in mind for R520/PC. I might be completely wrong, but this far the above makes to more sense than anything else.

edit: ooops my guess would be =/>250M.

Jawed
11-Apr-2005, 09:36
I'm guessing the major difference with R520 is that subject to register dependency you'll always get a double-issue across the pairs of vector and/or scalar ALUs in the pixel shader. I only say that because I have a feeling that a double-issue of vector ops isn't possible in R420. But, ahem, I'm not clear on that.

You certainly can dual-issue in the vertex hardware in R420, it's just not that common. Basically you should assume the shader compiler for vertex programs is going to suck, relatively speaking, compared to the output it'll produce for fragment programs.

Transistor wise, just under 300M is my guess.
Sorry, I was referring solely to the PS1.4 and PS2.0 vector units in the pixel shader.

Double-issuing in the vertex shader between its two vector ALUs is a given. Otherwise they'd be pretty pointless as a pair.

In Richard Huddy's GDC05 presentation he was specific about being only able to issue upto 5 scalar ops in a cycle, in pixel shader code. That implies to me that you get 3 ops from the PS2.0 ALU, and then 1 each from the two scalar ALUs. The PS1.4 ALU is high and dry. If the PS1.4 vector ALU was able to contribute, you wouldn't be limited to 5 ops.

The PS1.4 ALU, according to:

http://www.beyond3d.com/reviews/ati/r420_x800/index.php?p=8

can do some things that the PS2.0 ALU can't. Anyone know what? 3Dc?

Jawed

Jawed
11-Apr-2005, 09:42
Dual issue or Co-issue? ;)

Well since you count ALUs differently from ATI (you count 2, ATI counts 4 math ALUs in the pixel shader) ...

Jawed

Rys
11-Apr-2005, 09:57
Dual issue or Co-issue? ;)

Bah, just one vector, one scalar ALU (I was thinking the vertex hardware was more capable than it was) per unit.

2 ops, co-issue. You know, for the longest time, I've been somewhat assuming you could do two vector ops per clock, per unit, in the vertex hardware for R420.

Blazkowicz
11-Apr-2005, 10:16
Given the Goldman analyst "smaller die" prediction, and some interesting-yet-now-missing speculation upstream re R580, anyone leaning more 250-300Mish for R520? Anyone think it will be *less* than 250M?

geo,

There are always bits of information that leak through; it depends what each bit of information concerns and if it won't cause for confusion while being channelled through who knows what half-way ignorant receivers.

Assume R500/Xenon being a 4 quad/unified architecture with ~10MB eDRAM; how do you figure would the transistor count look like on that one, especially considering the amount of transistors the embedded memory might consume?

The initial clockspeed estimates for the Xenon GPU where around 500+MHz; no idea of course if it'll turn out at that rate or even higher. Albeit transistor count isn't necessarily directly connected to core frequency, I'm having rather a 650-700MHz clockspeed in mind for R520/PC. I might be completely wrong, but this far the above makes to more sense than anything else.

edit: ooops my guess would be =/>250M.

rumor (or fact?) for R500 was 8 pipes / 500mhz

nAo
11-Apr-2005, 10:22
R500 can output 8 (2x multisampled AFAIK) pixels per clock, but it hasn't 8 'pixel pipelines'.

Blazkowicz
11-Apr-2005, 10:28
of course, speaking of pixel pipelines does not make sense anymore I guess

Dave Baumann
11-Apr-2005, 10:51
Well since you count ALUs differently from ATI (you count 2, ATI counts 4 math ALUs in the pixel shader) ...

No, we're talking about VS.

R500 can output 8 (2x multisampled AFAIK) pixels per clock, but it hasn't 8 'pixel pipelines'.

That depends on th way you look at it. If you think about Voodoo as a "traditional pipeline" you have a texture unit and a pixel unit as separate chips - modern day ROP's (which will still feature on upcoming hadrware) effectively are the equivelent of the pixel unit, the maths element is just a bunch of crap slapped inbetween the texture unit and the ROP's. This becomes more obvious in an architecture like Xenon graphics, but current parts are the ones that are confusing it because the maths pipelines, texture pipelines and pixel pipelines (ROP's) all correlate fairly closely. ;)

nAo
11-Apr-2005, 11:02
This becomes more obvious in an architecture like Xenon graphics, but current parts are the ones that are confusing it because the maths pipelines, texture pipelines and pixel pipelines (ROP's) all correlate fairly closely
Yeah, I know. Unfurtunately it seems telling this 10 times doesn't make any difference at all.. ;)

Jawed
11-Apr-2005, 11:10
I think this picture sums up what Dave's saying quite nicely:

http://www.beyond3d.com/previews/nvidia/nv40/images/quad_pipeline.gif

Jawed

Dave Baumann
11-Apr-2005, 11:17
This is probably a better example (http://www.beyond3d.com/previews/nvidia/nv43/index.php?p=7), however even then then the correlation is quite close and would expect this to furter change with future architectures.

Jawed
11-Apr-2005, 11:42
[0030] FIG. 5 illustrates a block diagram representing the further execution of the command threads upon completion of all embedded commands therein. The ALU 308 is coupled to a render backend 350 via connection 352 and to a scan converter 356 via connection 358. As recognized by one having ordinary skill in the art, the ALU 308 may be operably coupled to the render backend 350 such that the bus 352 incorporates one or more of a plurality of connections for providing the completed command thread, such as command thread 316 of FIG. 4, thereto. Furthermore, as recognized by one having ordinary skill in the art, ALU 308 may be operably coupled to the scan converter 356 such that the connection 358 may be one or more of a plurality of connections for providing the executed command thread, such as command thread 322 of FIG. 4, to the scan converter 356. As discussed above, once the command thread, have an indicator bit, such as the done flag, set indicating all of the commands in the thread have been executed, the completed command thread is further provided in the processing pipeline. Moreover, the render backend 350 may be any suitable rendering backend for graphics processing as recognized by one having ordinary skill in the art. The scan converter 356 may be any suitable scan converter for graphics processing as recognized by one having ordinary skill in the art.

http://www.cupidity.f9.co.uk/b3d04.jpg

My understanding is that SM3 requires fog and blend to be implemented in shader code, so that's one function that won't be appearing in the render backend.

Jawed

Jawed
11-Apr-2005, 12:29
Dual issue or Co-issue? ;)

Bah, just one vector, one scalar ALU (I was thinking the vertex hardware was more capable than it was) per unit.

2 ops, co-issue. You know, for the longest time, I've been somewhat assuming you could do two vector ops per clock, per unit, in the vertex hardware for R420.

Ah, sod it, just realised I was under the impression the two ALUs in the vertex shader were vector. Bugger. :(

Edited my pix.

I'll assume for the time being that R520 has 2 vector ALUs in the vertex shaders. That way if the driver compiler wants to issue a vector+scalar or vector+vector, it can. :)

Though I wouldn't be surprised to discover that it only makes sense to issue a vector+scalar in a vertex shader...

Jawed

Geo
11-Apr-2005, 14:49
All that "ordinary skill in the art" can make a fellow feel quite inadequate. :(

trinibwoy
11-Apr-2005, 14:52
All that "ordinary skill in the art" can make a fellow feel quite inadequate. :(

Glad I'm not the only one that felt that way after reading that :D

Geo
11-Apr-2005, 17:07
ill go ahead and say 225ish. i dont see ati rly pushing any boundaries transistor wise or performance wise when they are going to a new process and nvidia isnt launching anything new.

I'm not signing up for this yet, but it would be an interesting option to use your high-end part to follow your old "middle part first" process transition strategy. That might actually fit with a more ambitious "Take II" refresh strategy that better lines up against NV's 90nm launch. Not signing up yet, but there is a little bit of synchronicity there that is attractive.

But as to "nv isn't launching anything new", well. . .

Demirug
11-Apr-2005, 17:30
[0030] FIG. 5 illustrates a block diagram representing the further execution of the command threads upon completion of all embedded commands therein. The ALU 308 is coupled to a render backend 350 via connection 352 and to a scan converter 356 via connection 358. As recognized by one having ordinary skill in the art, the ALU 308 may be operably coupled to the render backend 350 such that the bus 352 incorporates one or more of a plurality of connections for providing the completed command thread, such as command thread 316 of FIG. 4, thereto. Furthermore, as recognized by one having ordinary skill in the art, ALU 308 may be operably coupled to the scan converter 356 such that the connection 358 may be one or more of a plurality of connections for providing the executed command thread, such as command thread 322 of FIG. 4, to the scan converter 356. As discussed above, once the command thread, have an indicator bit, such as the done flag, set indicating all of the commands in the thread have been executed, the completed command thread is further provided in the processing pipeline. Moreover, the render backend 350 may be any suitable rendering backend for graphics processing as recognized by one having ordinary skill in the art. The scan converter 356 may be any suitable scan converter for graphics processing as recognized by one having ordinary skill in the art.

http://www.cupidity.f9.co.uk/b3d04.jpg

My understanding is that SM3 requires fog and blend to be implemented in shader code, so that's one function that won't be appearing in the render backend.

Jawed

Only fog need to be done in the shader. Blend operations are still the job for the raster operators. But doing fog in the shader is a simplification because it is not longer a pipeline stage as in SM <= 2.0.

wireframe
12-Apr-2005, 01:27
I really like how this thread has become one huge apology why R520 doesn't need to provide anything. C'mon, what happened to the 24/8 or 32 shared pipelines and the 512-bit memory interface? If we keep this up much longer the R520 doesn't need to be much more impressive than the original Radeon to have the ooohs and aaaahs.( "will ya look at that! who'da thunk it!")

If ATI is not going to use 90nm to their advantage, why use it at all?

I realize many of you speak about these matters from an investment perspective, but let's at least try to keep the technological worship alive (besides, whatever happened to the tech chanting with NV40? who cares if it works? It does have a technological peak compared to all else. why isn't it more widely praised in these parts...inhabited by enthusiasts, as it were?)

wireframe
12-Apr-2005, 01:30
This needs its own entry because it is wildly separated from what I posted above. I just need to ask:

Let's forget how R520 does what it will undoubtedly do. How much of a performance increase are you expecting from this, now legendary, part?

Would the hooplah be worth it for 40% on top of NV40? 30%? You tell me.

PS. I think it's more appropriate to compare it to NV40 than R420 because that's where things are going and that is the competition. Feel free to chastice me on that point, however.

Jawed
12-Apr-2005, 01:31
I really like how this thread has become one huge apology why R520 doesn't need to provide anything. C'mon, what happened to the 24/8 or 32 shared pipelines and the 512-bit memory interface? If we keep this up much longer the R520 doesn't need to be much more impressive than the original Radeon to have the ooohs and aaaahs.( "will ya look at that! who'da thunk it!")

If ATI is not going to use 90nm to their advantage, why use it at all?

I realize many of you speak about these matters from an investment perspective, but let's at least try to keep the technological worship alive (besides, whatever happened to the tech chanting with NV40? who cares if it works? It does have a technological peak compared to all else. why isn't it more widely praised in these parts...inhabited by enthusiasts, as it were?)
Maybe all eyes are on Xbox 360 versus PS3? Out of R500 and R520, R500 is way more interesting to me :)

Kaleidoscope, whatever the hell it is, seems to be the only thing we have a sniff of that's not easy to extrapolate.

On everything else we're just going round in circles.

Jawed

wireframe
12-Apr-2005, 01:37
Maybe all eyes are on Xbox 360 versus PS3?
I bet a lot of eyes are. This goes hand in glove with my belief that most people posting are more interested in the business aspects, while masking that, of these companies rather than the technological whizz-bang.


Kaleidoscope, whatever the hell it is, seems to be the only thing we have a sniff of that's not easy to extrapolate.
I need to read about this one. Kaleidoscope. bahhh.. It has inspired so many thoughts that it has already been personally worth it. I just want to know what ATI's interpretation is already! :P


On everything else we're just going round in circles.
Yes, but how many polys per circle? Let's talk about the level of tesselation where we see the point!

Jawed
12-Apr-2005, 01:54
This needs its own entry because it is wildly separated from what I posted above. I just need to ask:

Let's forget how R520 does what it will undoubtedly do. How much of a performance increase are you expecting from this, now legendary, part?

Would the hooplah be worth it for 40% on top of NV40? 30%? You tell me.

PS. I think it's more appropriate to compare it to NV40 than R420 because that's where things are going and that is the competition. Feel free to chastice me on that point, however.

I'd guess that R520 is a technology implementation first and foremost, rather than a balls-out crush NVidia part. Dave does keep trying to manage our expectations...

If you combine SM3, Kaleidoscope and 90nm, you could argue that it's fairly risky for ATI. A lot of that risk (all those technologies? but not unified shading or virtual memory?) is shared with R500.

I dare say it's easy to say that NVidia is sitting pretty - a few tweaks to NV40, refined process etc. to keep R520 at bay

The 16-1-3-1 rumour about R580 - now that's a corker. And if R600 is only 14 months (?) away, do R520/R580 really matter very much?, we sorta know that conventional pipelines are on their last legs in ATI's eyes.

Doncha think it'll be kinda boring when R520/R580 will be conventional pipeline parts, while R500 will be racing ahead into unified shader glory. Who wants R580 then, eh?...

Jawed

Dave Baumann
12-Apr-2005, 02:07
Maybe all eyes are on Xbox 360 versus PS3?
I bet a lot of eyes are. This goes hand in glove with my belief that most people posting are more interested in the business aspects, while masking that, of these companies rather than the technological whizz-bang.

IMO, this is backwards - the interest is on the consoles because of the technology. Certainly where ATI is concerned it looks like we are going to get a better peak into their longer term future by looking at Xenon than we are R520 and it looks to be a genuinely different part. With NVIDIA I think the general expecation is that their console tech won't significantly differ from similar generation PC technology but theres still a lot of intruige over their implemtation, the memory and how its going to integrate with Cell, some of which may have similar ramifications for the desktop.

Geo
12-Apr-2005, 02:12
I really like how this thread has become one huge apology why R520 doesn't need to provide anything. C'mon, what happened to the 24/8 or 32 shared pipelines and the 512-bit memory interface? If we keep this up much longer the R520 doesn't need to be much more impressive than the original Radeon to have the ooohs and aaaahs.( "will ya look at that! who'da thunk it!")

If ATI is not going to use 90nm to their advantage, why use it at all?

I realize many of you speak about these matters from an investment perspective, but let's at least try to keep the technological worship alive (besides, whatever happened to the tech chanting with NV40? who cares if it works? It does have a technological peak compared to all else. why isn't it more widely praised in these parts...inhabited by enthusiasts, as it were?)

Hmm, I wasn't thinking of this as a "what I'm wishing and hoping for" thread.

wireframe
12-Apr-2005, 02:12
Doncha think it'll be kinda boring when R520/R580 will be conventional pipeline parts, while R500 will be racing ahead into unified shader glory. Who wants R580 then, eh?...

You better believe I think it's boring. All this "No, R400 is so advanced we can't release it yet...or ever...." and the "this technology is so advanced that we want to keep it small in an Xbox for a while. R300 wasn't a success because it was traditional. It boldly went where no part had gone before and it took the NV40 to go there and beyond.

Am I the only person with an interest in the actual hardware parts that found R420 to be completely boring? It brought nothing new...after all that talk. Well, lemme tell ya, R520 better, because I won't be terribly happy buying one to only get NV40*1.4.

We know ATI has the capability and they certainly have developed the jaw to talk about it, so I think we better expect great things. At least I will expect to read great possibilites before I am let down, if that is the fact. Personally, I refuse to believe R520 is anything but boring. I also don't want to hear about equivalencies because if ATI releases R520 to be equivalent to something ("but we do it in new interesting ways!") they better shut up about things like R400 being too advanced for this puny earth...polluted by earhtlings... :P Get real! R300 was all about getting real. Let's at least assume that R400 -> R500 -> R520 is more of the same. I see no reason not to (other than the fact that R420 was a complete letdown where people thought ATI would bring it on and ATI decided to milk some more instead. Good wood for the investor crowd but boring for the enthusiasts.)

wireframe
12-Apr-2005, 02:14
Hmm, I wasn't thinking of this as a "what I'm wishing and hoping for" thread.

Then it makes it even more interesting why your assumptions have dwindled since the first page. Please elaborate on why you think it no longer needs more than 16 pipelines or even more transistors than NV40.

Jawed
12-Apr-2005, 02:16
With NVIDIA I think the general expecation is that their console tech won't significantly differ from similar generation PC technology but theres still a lot of intruige over their implemtation, the memory and how its going to integrate with Cell, some of which may have similar ramifications for the desktop.
I'll be disappointed if that doesn't happen.

With these rumours of "multi-chip" NVidia implementations in the future for desktop parts, is there, perhaps, a possibility that NVidia could license Cell tech to replace the gubbins that precede the pixel shader pipelines? In other words could NVidia implement a "mini-PS3" as a desktop part? Wouldn't that be groovy?

Jawed

wireframe
12-Apr-2005, 02:18
Maybe all eyes are on Xbox 360 versus PS3?
I bet a lot of eyes are. This goes hand in glove with my belief that most people posting are more interested in the business aspects, while masking that, of these companies rather than the technological whizz-bang.

IMO, this is backwards - the interest is on the consoles because of the technology. Certainly where ATI is concerned it looks like we are going to get a better peak into their longer term future by looking at Xenon than we are R520 and it looks to be a genuinely different part. With NVIDIA I think the general expecation is that their console tech won't significantly differ from similar generation PC technology but theres still a lot of intruige over their implemtation, the memory and how its going to integrate with Cell, some of which may have similar ramifications for the desktop.

Interesting. I had a geenral feeling that the Nvidia part for Sony's PS3 would have very little to do with Nvidia's overall roadmap and was specifically tailored for Sony's purposes. On the other hand, I also felt that ATI's contribution to Xbox would be more heavily dominated by ATI's vision of where they want their overall design process to go. I will play with an open hand and state that this is tied to how I view ATI as a bed partner with MS while Nvidia is heavily resisting MS's takeover of the 3D market (this may have little to do with this particular design, but this is how I see it. Cure me for free.)

Dave Baumann
12-Apr-2005, 02:24
I had a geenral feeling that the Nvidia part for Sony's PS3 would have very little to do with Nvidia's overall roadmap and was specifically tailored for Sony's purposes.

They have already explicitly stated that it is a custom chip but architected from their next generation of PC products.

Geo
12-Apr-2005, 02:28
This needs its own entry because it is wildly separated from what I posted above. I just need to ask:

Let's forget how R520 does what it will undoubtedly do. How much of a performance increase are you expecting from this, now legendary, part?

Would the hooplah be worth it for 40% on top of NV40? 30%? You tell me.

PS. I think it's more appropriate to compare it to NV40 than R420 because that's where things are going and that is the competition. Feel free to chastice me on that point, however.

"Compare", how? Performance they are already faster in most scenarios. Features? I wouldn't expect anything less than NV40.

The question is are they out to beat the 110nm NV part with R520 or the 90nm NV part? Originally I was assuming the latter. Lately I've been wondering if that job has been given to R580 in order to lessen the risk of the 90nm move and "keep their powder dry" (and hidden) for closer to NV's 90nm release.

wireframe
12-Apr-2005, 02:51
This needs its own entry because it is wildly separated from what I posted above. I just need to ask:

Let's forget how R520 does what it will undoubtedly do. How much of a performance increase are you expecting from this, now legendary, part?

Would the hooplah be worth it for 40% on top of NV40? 30%? You tell me.

PS. I think it's more appropriate to compare it to NV40 than R420 because that's where things are going and that is the competition. Feel free to chastice me on that point, however.

"Compare", how? Performance they are already faster in most scenarios. Features? I wouldn't expect anything less than NV40.

Well, I didn't put any negative percentages there on purpose. I would fully expect R520 to beat up NV40 in more ways than one. Why did you answer so conservatively? Was my original statement, elsewhere, about this thread becoming "apologetic" right on? What exactly are you expecting from R520 in terms of performance, without thinking about the how, for it to be a meaningful entrant to the market?

It better be a lot more than the conservative numbers I suggested (mostly to coax the "5000%!!!!!" response). The Legend of ATI depends on this, right? It better be good, and when I say good I mean in more ways than as a corporate investment and product. I mean it as a technological masterpiece. ...Or....are we at the stage already when we are writing ATI off because "ohhh...but look at the profit a competitive low tranny 90nm part wil generate"?

I am not writing ATI off and I fully expect even a 512-bit memory bus or equivalent. My $1 saying they can shake it more than once.

wireframe
12-Apr-2005, 02:55
I had a geenral feeling that the Nvidia part for Sony's PS3 would have very little to do with Nvidia's overall roadmap and was specifically tailored for Sony's purposes.

They have already explicitly stated that it is a custom chip but architected from their next generation of PC products.

Can we still be best buddies after I ask you to show me the statement?:P I am sure you are right, but I would like to read it again and look at your ink marks.

PS. I better explain that the reason I ask this is because I recall reading that Nvidia was being cloudy and merely stating that their "PS3" solution used technology that may be featured in upcoming desktop PC products and not necessarily a distinct derivative one way or the other.

jvd
12-Apr-2005, 03:03
wireframe you expect nvidia to make a fully custom gpu thats not based off any gpu in sonys pipeline in under a year ?

wireframe
12-Apr-2005, 03:19
wireframe you expect nvidia to make a fully custom gpu thats not based off any gpu in sonys pipeline in under a year ?

Do you think Sony would issue a tender for a video solution that needs to be fulfilled within a year?

BTW, I think it is generally overestimated how long it takes to make a design once the target specs are known. It is different to make a request design thant to make and adaptive design for a market where the unit itself will be the sole determinant of success. The GPU for "PS3" is not the whole story and I am sure there are requests in there that don't make sense to Nvidia's vision or perception of reality, but Sony is the customer and the customer is always right.

Geo
12-Apr-2005, 03:49
I had a geenral feeling that the Nvidia part for Sony's PS3 would have very little to do with Nvidia's overall roadmap and was specifically tailored for Sony's purposes.

They have already explicitly stated that it is a custom chip but architected from their next generation of PC products.

Can we still be best buddies after I ask you to show me the statement?:P I am sure you are right, but I would like to read it again and look at your ink marks.

PS. I better explain that the reason I ask this is because I recall reading that Nvidia was being cloudy and merely stating that their "PS3" solution used technology that may be featured in upcoming desktop PC products and not necessarily a distinct derivative one way or the other.

Well, I definitely remember reading an interview with an NV guy where they said that.

wireframe
12-Apr-2005, 03:57
I had a geenral feeling that the Nvidia part for Sony's PS3 would have very little to do with Nvidia's overall roadmap and was specifically tailored for Sony's purposes.

They have already explicitly stated that it is a custom chip but architected from their next generation of PC products.

Can we still be best buddies after I ask you to show me the statement?:P I am sure you are right, but I would like to read it again and look at your ink marks.

PS. I better explain that the reason I ask this is because I recall reading that Nvidia was being cloudy and merely stating that their "PS3" solution used technology that may be featured in upcoming desktop PC products and not necessarily a distinct derivative one way or the other.

Well, I definitely remember reading an interview with an NV guy where they said that.

Ok, so maybe you can show it to me with some of your own highlights. You know, just to help me out...so this doesn't become a "well, I remember when Poland was a part of Germany" fact for me. I am sorry to ask for the actual text, but I sincerely don't recall reading that the Sony project has/had any direct link to their PC technology parts. Any actual text out of the horses mouth would be greatly appreciated.

Xmas
12-Apr-2005, 04:13
http://www.xbitlabs.com/articles/editorial/display/ces2005.html

This chip is a custom version of our next generation GPU.

Geo
12-Apr-2005, 04:15
Hmm, I wasn't thinking of this as a "what I'm wishing and hoping for" thread.

Then it makes it even more interesting why your assumptions have dwindled since the first page. Please elaborate on why you think it no longer needs more than 16 pipelines or even more transistors than NV40.

My assumptions have dwindled? Checked my sig lately? :lol:

I don't recall ever assuming more than 16 pipes. The closest I got in that direction was in wondering if the source of the "24" pipeline rumor was from unified shaders (16 + 8).

I will own up to coming down somewhat on transistors, from 300-350m to somewhat less than 300m, but upper 200's. Why did I do that? In part from reevaluating the "smaller die" analyst comment and in part discusion of what is expected in R580 that leads me to believe it might be more than the typical refresh. Assumption that time-wise R580 lines up better against NV's 90nm part serves as a check that such a strategy would have some justification. I'm assuming that 300-350M is pretty close to the limit of what 90nm can accomplish, and if R580 is going to be significantly more brawny than in my mind that means that R520 would have to be significantly less brawny.

Tho so far as that goes, I've also been playing a little interlocutor role on this thread, tho maybe I shouldn't have. I've asked a lot of questions and posed a lot of hypotheticals that have nothing to do with what I think and everything to do with finding out what the other participants are thinking and why they are thinking it.

In part because I enjoy it, and in part to keep the convo going until a new tidbit/rumor shows up somewhere to keep the convo going on its own steam. :)

http://www.beyond3d.com/forum/viewtopic.php?t=21341&postdays=0&postorder=asc&sta rt=236

So, reading that again, I suppose I owe Wavey a mea culpa as I did imply I got 300-350m from him, but I'd agree he's never said any such thing anywhere that I've seen. Also that I said even then I was starting to lean fewer transistors and a higher clock.

I have not given up on the bus (hence my sig), even tho for the life of me I have not been able to square the circle (so to speak) on whether it only makes sense as an enabler for unified shaders. I get very strong, but mixed, signals on that which are contradictory on their face --but no doubt in hindsight will be "shoulda had a V8!" clear.

trinibwoy
12-Apr-2005, 04:37
This is the next generation GPU, so after the GeForce 6 series this is going to be the next generation. So, this will be everything we have in GeForce 6 + whatever else we bring out.

These are expensive chips to develop. So, the fact that we didn’t have to do that development just for the Sony application obviously is a major economy of scale, because we are doing the development for the new chip anyway.

I think it's a safe bet that the GPU in the PS3 is going to very closely resemble Nvidia's next gen part.

Geo
12-Apr-2005, 05:02
http://www.xbitlabs.com/articles/editorial/display/ces2005.html

This chip is a custom version of our next generation GPU.

Damn. I would see this post *after* I went to hunt it up. Yes, this is the one I had in mind.

Also, As you know we don’t talk about next generation products but it’s our next generation of GPU.

He actually said it three or four times with minor variations over the course of the interview. No wonder I remember reading it. :lol:

And here's the B3D thread: http://www.beyond3d.com/forum/viewtopic.php?t=19788&postdays=0&postorder=asc&hig hlight=roman&start=0

Note Wavey noting how hard Roman was driving home the point under discussion here.

Ailuros
12-Apr-2005, 09:52
I am not writing ATI off and I fully expect even a 512-bit memory bus or equivalent. My $1 saying they can shake it more than once.

I have severe doubts about that.

I personally never expected any significant performance increases from either/or IHV and their upcoming new batch of high end accelerators (successors to NV40/R480).

Yes I do expect something rather like 1.4-1.5*NV40 at best from either side. Official statement from Michael Hara on B3D's frontpage (2nd time I link to it in this thread):

“Well, from an architecture standpoint we’re just still at the beginning of shader model 3.0. And we need to give the programmers out there some time to continue to really learn about that architecture. So in the spring refresh what you’ll see is a little bit faster versions...

...I think you’ll see the industry move up a little bit in performance. But I don’t think you’ll see any radical changes in architecture. I doubt you’ll see any radical changes in architecture even in the fall. When we came out with GeForce 6, we tend to create a revolutionary architecture about every actually two years. And then we derive from it for the following time. So even the devices that we announced this fall, that will be I think a lot more powerful than the ones we actually had a year ago. Architecturally we’re still in the shader model three type era.”

How do you define exactly "a little bit"? The spring refresh should be aiming to compete against R520, while the 90nm fall part most likely against R580. I'd use terms like "huge" or "massive" for significant performance leaps instead of "a little bit".

Well, lemme tell ya, R520 better, because I won't be terribly happy buying one to only get NV40*1.4.

Both performance and feature-wise I guess that's exactly what you should expect IMHO.

If ATI is not going to use 90nm to their advantage, why use it at all?

Pumping up core frequency beyond 650MHz is actually a way of taking advantage of the manufacturing process in question. An alternative for the competition would be to use 110nm and more quads in order to reach equivalent fill-rates.

However I think both IHVs would have felt a lot better if GDDR4 f.e. would be available already.

***edit: I think that Wavey has already thrown a couple of hints around that as soon some details on both highly expected next generation consoles appear, those will be the most interesting material for related debates. I'd rather guess myself that this year won't bring any significant leaps technology-wise for desktop GPUs.

WaltC
12-Apr-2005, 12:04
...

I will play with an open hand and state that this is tied to how I view ATI as a bed partner with MS while Nvidia is heavily resisting MS's takeover of the 3D market (this may have little to do with this particular design, but this is how I see it. Cure me for free.)

I don't follow at all how MS is "taking over" and nV is "heavily resisting" anything MS is doing...;) Last time I checked nV was boasting how "DX-compliant" its products were, just like ATi, and the fact is that both nV and ATi are independently owned by interests other than MS. Perhaps you have mistaken voluntary industry collaboration among hardware and software companies to establish standards with some kind of takover mentality or myth. Standardization and support of APIs is a requirement for 3d game developers--there's nothing sinister about it, imo. In order for the pieces to come together everybody has to pay homage to standards.

PatrickL
12-Apr-2005, 12:24
Don't you think that 520 limiting factor will be ultmately the need for ATI to be able to have them available widely?
When i read some post i wonder if people realize that whatever the real performance of the R520 is, any lack of availability will make it a perceived failure in my opinion.

Chalnoth
12-Apr-2005, 13:57
Don't you think that 520 limiting factor will be ultmately the need for ATI to be able to have them available widely?
When i read some post i wonder if people realize that whatever the real performance of the R520 is, any lack of availability will make it a perceived failure in my opinion.
But how much performance would ATI have to sacrifice to make the R520 available in high volume (for a high-end product)? Given the current cutthroat competition, I'm willing to bet that both nVidia and ATI will release a super high-end part in this next generation at the $500+ price range that just isn't available in any significant quantities.

So, if you care about availability, take a look at which products they are able to get out right away (which will likely be the products that will take the place of the 6800 GT and X800 Pro in the marketplace).

london-boy
12-Apr-2005, 14:16
Could this thread be any longer? Considering i still have to find real info on the R520... Might be buried between the 98th and the 102nd page...

PatrickL
12-Apr-2005, 14:34
Don't you think that 520 limiting factor will be ultmately the need for ATI to be able to have them available widely?
When i read some post i wonder if people realize that whatever the real performance of the R520 is, any lack of availability will make it a perceived failure in my opinion.
But how much performance would ATI have to sacrifice to make the R520 available in high volume (for a high-end product)? Given the current cutthroat competition, I'm willing to bet that both nVidia and ATI will release a super high-end part in this next generation at the $500+ price range that just isn't available in any significant quantities.

So, if you care about availability, take a look at which products they are able to get out right away (which will likely be the products that will take the place of the 6800 GT and X800 Pro in the marketplace).

You missed my point. The X800 XT-PE while faster than the 6800 Ultra got all the bad press as it was not available enough in the retail market. My point is it is not a good move to have the fastest version of the chip if you can't get a positive image from it.

Mariner
12-Apr-2005, 15:16
I think it's already been mentioned (somewhere) of this board that the supply problems experienced with the 0.13 micron low-K chips aren't as likely to be experienced for the 0.09 micron chips. The fab capacity running 0.09 micron is much much higher as it is a 'mainstream' process as opposed to the more specialist 0.13 low-K.

This assumes that yields are good etc etc.

karlotta
12-Apr-2005, 15:34
well the 520 is .09 low~k

Sunrise
12-Apr-2005, 15:43
Don't you think that 520 limiting factor will be ultmately the need for ATI to be able to have them available widely?
When i read some post i wonder if people realize that whatever the real performance of the R520 is, any lack of availability will make it a perceived failure in my opinion.

As discussed before, ATI isn´t really having architectural problems "building" their current 130nm low-k GPUs, it has something to do with manufacturing capacity @ TSMC. They only have one fab that is capable of 130nm low-k production while TSMC´s 90nm low-k "NexSys" technology is already available in about 2-3(4) fabs (Fab12A, Fab12B/C, Fab14 in Q4) all producing on 300mm wafers. TSMC also states that defects have lowered and the transition from 130nm -> 90nm went rather smoothly compared to 130nm-transition a while back. Being both a high volume and high-end process, this could be a significant advantage for ATI in many aspects.

ATI can and surely will decide at what point they have sufficient yields for their next high-end part and push it out fast. You have to keep in mind though that i wouldn´t expect anything "out of this world" from them, not from their "first" 90nm high-end part at least. R520 should be a very complex chip so they have to decide at what point they can manage to release it with confidence (features and speed) and not be hindered by availability, so you are certainly right in mentioning this.

MuFu
12-Apr-2005, 16:00
What is generally considered to be the best strategy when adopting a new node for the first time...

Maximise die size, focus on computational throughput per clock.
Minimise die size, focus on clock speed.
:?:

Obviously a balanced approach will prevail, but which way does it tend to swing when the analog/fab side is relatively unchartered?

_xxx_
12-Apr-2005, 16:00
AFAIK the chip is already 99% finalized, so there's not much room for changing anything significant. They'll have to live with their decision, whatever it should be.

But I still think of the possibility of nV unexpectedly coming up with some part that could seriously rock in terms of speed. Just like they surprised us all with that PS3 deal. Damn, I'm so curious!

Mariner
12-Apr-2005, 17:48
well the 520 is .09 low~k

Yep - and so is all TSMC 0.09 production. :)

karlotta
12-Apr-2005, 19:43
well the 520 is .09 low~k

Yep - and so is all TSMC 0.09 production. :) rgr that, and with a 09 for nvda this fall? then that would have to be UMC, no LowK for nvda.

ninelven
12-Apr-2005, 19:44
nm... i don't know what i was thinking.

GwymWeepa
12-Apr-2005, 20:37
*wanders in* damn, 45 pages? Alright, so that I don't have to wander through all of this, when are we expecting this new card?

Chalnoth
12-Apr-2005, 20:43
You do know if there was any real information the thread would be much shorter, right? :)

MuFu
12-Apr-2005, 20:56
You do know if there was any real information the thread would be much shorter, right? :)

Actually it would be slightly longer. A few posts have mysteriously disappeared. ;)

Geo
12-Apr-2005, 22:42
*wanders in* damn, 45 pages? Alright, so that I don't have to wander through all of this, when are we expecting this new card?

Clubhouse Leader is annoucement end of May/first week of June with availability shortly thereafter (much more shortly than recent releases for ATI).

I saw a tidbit somewhere that leads me to believe that ATI has changed in a way that would support the idea of a shortened time between announcement and availability vs earlier releases.

But the primary sources for the above are what Orton told the financial analysts and what ATI told Wavey at Cebit. Put 'em both together and you get what I said above.

ANova
12-Apr-2005, 23:54
Don't you think that 520 limiting factor will be ultmately the need for ATI to be able to have them available widely?
When i read some post i wonder if people realize that whatever the real performance of the R520 is, any lack of availability will make it a perceived failure in my opinion.
But how much performance would ATI have to sacrifice to make the R520 available in high volume (for a high-end product)? Given the current cutthroat competition, I'm willing to bet that both nVidia and ATI will release a super high-end part in this next generation at the $500+ price range that just isn't available in any significant quantities.

So, if you care about availability, take a look at which products they are able to get out right away (which will likely be the products that will take the place of the 6800 GT and X800 Pro in the marketplace).

You missed my point. The X800 XT-PE while faster than the 6800 Ultra got all the bad press as it was not available enough in the retail market. My point is it is not a good move to have the fastest version of the chip if you can't get a positive image from it.

Does the Ultra Extreme ring a bell?

trinibwoy
13-Apr-2005, 00:01
Does the Ultra Extreme ring a bell?

I forgot that even existed....

Tim Murray
13-Apr-2005, 00:27
Does the Ultra Extreme ring a bell?

I forgot that even existed....
that's because it never did, officially

pakotlar
13-Apr-2005, 01:16
Does the Ultra Extreme ring a bell?

I forgot that even existed....
that's because it never did, officially

No it did, offically, at launch. It then promptly dissapeared :lol:

Bouncing Zabaglione Bros.
13-Apr-2005, 01:19
Does the Ultra Extreme ring a bell?

I forgot that even existed....
that's because it never did, officially

IIRC didn't Nvidia give some to websites for benching purposes? At least the XTPE was a product you could buy if you looked hard enough and paid enough.

I sure hope everyone has better availability with the upcoming .09 products though...

Pete
13-Apr-2005, 02:10
What is generally considered to be the best strategy when adopting a new node for the first time...

Maximise die size, focus on computational throughput per clock.
Minimise die size, focus on clock speed.
:?:

Obviously a balanced approach will prevail, but which way does it tend to swing when the analog/fab side is relatively unchartered?Interestingly enough, nV seems to choose the first path (and got burned with NV30), and the post-R300 ATi has kind of chosen the second (9600 Pro and XT were both small and fast, whereas X300 went for small but relatively slow).

Judging from recent history, the real answer is probably closer to as good as they can do at the time. :)

BZB, Anandtech was (AFAIK) first out of the gate with 425 and even 450MHz "unofficial" 6800UEs. Eventually, BFG and I think XFX shipped 6800Us at 425MHz, but 400MHz is still the stock speed.

I wonder if both ATi and nVidia are waiting until after the Xbox 2 and PS3 to announce? The timing seems curiously coincidental.

CMAN
13-Apr-2005, 02:51
I think eVGA shipped some 6800 Super Duper Ultras at 450 MHz for about a month. :lol:

kemosabe
13-Apr-2005, 03:01
So now that we've firmly established how little we actually know about R520, anyone care to speculate about NV_ENGR1 (http://www.pcisig.com/developers/compliance_program/integrators_list/pcie/)? :?

Chalnoth
13-Apr-2005, 04:35
Sounds to me like a test product for testing the PCI Express interface with other parts. It may, for examle, be a dummy PCIe-AGP bridge solution with some null device on the AGP bus. Alternatively, it could just be the bridge itself.

But, regardless, that codename just doesn't sound like a graphics card.

Edit:
Well, actually, now that I think about it, this does make a bit less sense. This is a new addition, after all. So, it is mostly likely, then, a placeholder name for something we'll see later.

IgnorancePersonified
13-Apr-2005, 04:54
It's quite obviously the nvidia ppu unit soon to retail.

Geo
13-Apr-2005, 16:39
http://www.theinquirer.net/?article=22486

You'd think they are a 24hr cable news outlet the way they operate whether they have anything to say or not.

tEd
13-Apr-2005, 17:34
http://www.theinquirer.net/?article=22486

You'd think they are a 24hr cable news outlet the way they operate whether they have anything to say or not.

I'm wondering what they gonna report if they find out/turns out that it won't have 24/32 pipelines.

Are they gonna be shocked or disappointed and who are they gonna blame for the misinformation?

wireframe
13-Apr-2005, 18:33
http://www.theinquirer.net/?article=22486

You'd think they are a 24hr cable news outlet the way they operate whether they have anything to say or not.

I'm wondering what they gonna report if they find out/turns out that it won't have 24/32 pipelines.

Are they gonna be shocked or disappointed and who are they gonna blame for the misinformation?

Does it matter as long as it is done in a dramatic fashion? Drama sells and The Inq is quite good at it.

tEd
13-Apr-2005, 18:40
http://www.theinquirer.net/?article=22486

You'd think they are a 24hr cable news outlet the way they operate whether they have anything to say or not.

I'm wondering what they gonna report if they find out/turns out that it won't have 24/32 pipelines.

Are they gonna be shocked or disappointed and who are they gonna blame for the misinformation?

Does it matter as long as it is done in a dramatic fashion? Drama sells and The Inq is quite good at it.

I'm looking forward to it. Could be fun :)

wireframe
13-Apr-2005, 18:56
I'm looking forward to it. Could be fun :)

Heh. Yeah, I always get a laugh out of the Inq, whether I am laughing with them or at them. :D

On a larger scale I am very interested in how the R520 launch and final show of the numbers will be handled. The expectations have been driven up to immense proportions on a grass-roots level, especially with all this 'historic' hinting at how great R400 was going to be and how it is "too advanced". So, if it is not the unbeatable beast that it has been played up to be I would expect a lot of grass-roots work at bringing expectations back down and if it is "all that" then I wonder how they will play it because depending on the feature set, they are 'merely' catching up and maybe not bringing anything earth shattering to the table except the performance.

The more I think about the more I realize I hate thinking about it. I just want this thing out already and it better be great and someone better be willing to sell me one at a price that is lower than "my soul".

Joe DeFuria
13-Apr-2005, 19:09
The expectations have been driven up to immense proportions on a grass-roots level, especially with all this 'historic' hinting at how great R400 was going to be and how it is "too advanced".

R520 has little to do with R400 if the rumors / "accepted as true" knowledge is in fact true. (It is thought that R520 might borrow some stuff from R400, but the "great stuff" about R400 is supposedly channeled more into R500 (Xenon) and the R600 (next gen PC architecture.)

Tim Murray
13-Apr-2005, 19:16
The expectations have been driven up to immense proportions on a grass-roots level, especially with all this 'historic' hinting at how great R400 was going to be and how it is "too advanced".

R520 has little to do with R400 if the rumors / "accepted as true" knowledge is in fact true. (It is thought that R520 might borrow some stuff from R400, but the "great stuff" about R400 is supposedly channeled more into R500 (Xenon) and the R600 (next gen PC architecture.)
Which makes sense, as there haven't really been any radical paradigm shifts since DX9/R300.

wickedld9
13-Apr-2005, 19:22
I think eVGA shipped some 6800 Super Duper Ultras at 450 MHz for about a month. :lol:

For ~7 months they had weekly drawings for the chance to buy one. Now you can purchase them freely through their site.
http://www.evga.com/products/moreinfo.asp?pn=256-A8-N346-AX

Nice juicy rumors in this thread....Is it June yet? I'm ready to get back on the ATI side of things.

Geo
13-Apr-2005, 20:16
R520 has little to do with R400 if the rumors / "accepted as true" knowledge is in fact true. (It is thought that R520 might borrow some stuff from R400, but the "great stuff" about R400 is supposedly channeled more into R500 (Xenon) and the R600 (next gen PC architecture.)

I would find this so much easier to swallow if they'd just bumped the pipes 50% and moved on. But apparently they didn't.

Joe DeFuria
13-Apr-2005, 20:30
Right...they obviously need support of SM 3.0 (which requires FP32 pixel shader support) for if nothing else marketing reasons.

As I said though, they likely did take "some stuff" from the R400 architecture, but it's Xenon (R500), not R520, that is suppossed to be the closest decendent of the R400.

Dave mentioned in some other thread (or perhaps earlier in this one), and I agree, that going with a "radical" departure in architecture (ala R500) in a PC system where you need to run existing software that was designed to run on "traditional" architectures, is much more of a risk than doing it in a closed box upon which software is being designed specfically for it.

With such a relatively small difference (between SM 3.0 and 2.0), it doesn't make all that much sense to bring an entirely new architecture in. It makes the most sense when there is a more radical change in the API / programming model.

So I don't expect to see R400 typw architecture appear in the PC space until MS is ready to release WGF 2.0 / Longhorn.

Pete
13-Apr-2005, 20:35
For ~7 months they had weekly drawings for the chance to buy one. Now you can purchase them freely through their site.
http://www.evga.com/products/moreinfo.asp?pn=256-A8-N346-AXCool. Seriously.

As to the rumor-mongering at hand, I'm not sure the R520 hype ever became as overboiled as you say, wireframe. The forums here have distinguished between the "tweener" R520 and the truer next-gen R600 (aka R400) for quite some time. I guess Richard Huddy's "too sexy for TV" PDF comment did light a fire with its promise of super branching performance and the expectation that it would arrive late last year (IIRC), but that was all we had to go on for a long while. You'd think even nV would have figured out how to achieve decent branching performance by now.

Natoma
13-Apr-2005, 20:47
In any event the R520, whatever it ends up being, will make a nice upgrade from my 9800 Pro 256MB card. :)

tEd
13-Apr-2005, 21:20
For ~7 months they had weekly drawings for the chance to buy one. Now you can purchase them freely through their site.
http://www.evga.com/products/moreinfo.asp?pn=256-A8-N346-AXCool. Seriously.

As to the rumor-mongering at hand, I'm not sure the R520 hype ever became as overboiled as you say, wireframe. The forums here have distinguished between the "tweener" R520 and the truer next-gen R600 (aka R400) for quite some time. I guess Richard Huddy's "too sexy for TV" PDF comment did light a fire with its promise of super branching performance and the expectation that it would arrive late last year (IIRC), but that was all we had to go on for a long while. You'd think even nV would have figured out how to achieve decent branching performance by now.

Richard Huddy did not promise super branching performance. The words were "..with decent performance..."

DegustatoR
13-Apr-2005, 21:25
So now that we've firmly established how little we actually know about R520, anyone care to speculate about NV_ENGR1 (http://www.pcisig.com/developers/compliance_program/integrators_list/pcie/)? :?
I think this is some kind of a new PCIE chip from NVIDIA 8)

Geo
13-Apr-2005, 21:41
Richard Huddy did not promise super branching performance. The words were "..with decent performance..."

Right. And we all know what everyone thinks of NV40's branching.

So, put a number on your expectation for "decent performance" of branching relative to NV40. 50% faster? It seems to me that everyone agrees the bar is so low on NV40 that still won't get you there. 100% faster? Might still not be enuf. My benchmark is the kind of hurt (percentage-wise) that R300 put on GF4 re AA/AF. Seems to me that was the watershed where AA went from "some people, some of the time" to "darn near everybody, darn near all of the time --just a matter of how much, not whether".

tEd
13-Apr-2005, 22:05
Richard Huddy did not promise super branching performance. The words were "..with decent performance..."

Right. And we all know what everyone thinks of NV40's branching.

So, put a number on your expectation for "decent performance" of branching relative to NV40. 50% faster? It seems to me that everyone agrees the bar is so low on NV40 that still won't get you there. 100% faster? Might still not be enuf. My benchmark is the kind of hurt (percentage-wise) that R300 put on GF4 re AA/AF. Seems to me that was the watershed where AA went from "some people, some of the time" to "darn near everybody, darn near all of the time --just a matter of how much, not whether".

I don't think i could say r520 is xx% faster doing DB than nv40 just as a general rule because there might be big differences between the tested situations.
I did not even see any detail testing done with branching on a nv40 yet.
Frankly the only test of DB i've seen so far is in shadermark and there it shows DB with better performance compared to the same situation without

blakjedi
13-Apr-2005, 22:24
Why do people think whatever is in the Xenon will functionally be any lesser than whats in the R600?

Jawed
13-Apr-2005, 22:30
Richard Huddy did not promise super branching performance. The words were "..with decent performance..."

Right. And we all know what everyone thinks of NV40's branching.

So, put a number on your expectation for "decent performance" of branching relative to NV40. 50% faster? It seems to me that everyone agrees the bar is so low on NV40 that still won't get you there.

NV40 appears to do branching in pixel shaders at some level of granularity larger than a quad. If R520 can do branching at the quad-level that's gotta be a major major win hasn't it?

The problem being, what developer will write code that takes advantage of quad-level granularity if NVidia hardware can't do it.

So, seriously, what are the chances that NVidia will leave low-granularity dynamic branching in NV40's successor? Seems unlikely to me. Otherwise, maybe we can start braying about how NVidia held back SM3 by making its key feature barely usable.

As far as the latency incurred by branching (i.e. when the code path is mis-predicted - I'm presuming there's lots of loop-unrolling in the compiler driver, for example) I can't imagine that R520 will offer any useful advantage in that sense - there's always going to be some latency there, and in the big picture of shaders that execute in 20-100 cycles, one or two cycles difference in latency between the two architectures is not going to make us all sit up and pay attention.

So, overall, I reckon R520 will prolly make NV40 look sick on dynamic branching, but NV40's successor should be in the same ballpark.

Jawed

Jawed
13-Apr-2005, 22:38
I did not even see any detail testing done with branching on a nv40 yet.
Frankly the only test of DB i've seen so far is in shadermark and there it shows DB with better performance compared to the same situation without

This is hardly a game:

http://graphics.stanford.edu/~yoel/notes/ - 21 February

but it shows problems. I believe he's using the technique Humus came up with in his early-Z demo (what's its name?) in order to solve the problem he's having with NV dynamic branching performance.

SC:CT apparantly uses dynamic branching in a fairly global shader, but I'm pretty vague on that. Plainly performance there isn't a problem (SM3 mode with the same eye-candy as SM1 mode runs 5-8% faster I think).

So I think it's unfair to generalise about NV40's dynamic branching performance as there are scenarios where it works without a hitch. I just get the feeling developer's hands are somewhat tied by GPU architecture gotchas. But aren't they always?

Jawed

Jawed
13-Apr-2005, 22:42
Why do people think whatever is in the Xenon will functionally be any lesser than whats in the R600?

Because R600 is a WGF2 part, and WGF2 hasn't been finalised (or at least won't have been finalised early enough for R500 in Xbox 360 aka Xenon).

R500 should be finished right now... prolly has been for a few months.

Jawed

Chalnoth
13-Apr-2005, 22:44
So, put a number on your expectation for "decent performance" of branching relative to NV40. 50% faster? It seems to me that everyone agrees the bar is so low on NV40 that still won't get you there. 100% faster? Might still not be enuf.
Given that the NV40 already shows significant performance improvement from dynamic branching in a few demos, I don't think this is really necessary.

ninelven
13-Apr-2005, 22:52
Indeed, I expect little improvement from nv40 as far as dynamic branching "speed" is concerned. Coherency seems to be larger issue to me.

Geo
13-Apr-2005, 23:01
Dave mentioned in some other thread (or perhaps earlier in this one), and I agree, that going with a "radical" departure in architecture (ala R500) in a PC system where you need to run existing software that was designed to run on "traditional" architectures, is much more of a risk than doing it in a closed box upon which software is being designed specfically for it.

With such a relatively small difference (between SM 3.0 and 2.0), it doesn't make all that much sense to bring an entirely new architecture in. It makes the most sense when there is a more radical change in the API / programming model.

So I don't expect to see R400 typw architecture appear in the PC space until MS is ready to release WGF 2.0 / Longhorn.

Of course the problem with arguing with Dave is, as he once told DC --retrospectively-- that sometimes you just have to accept that maybe he knows something you don't. Well, easy for him to say, as he knows when he knows, and he know's when he's shrewd guessing on partial info, and he knows when he's S/WAGing with the rest of us. We gotta guess which mode he's in. :D

What I've always thot on this matter --which prima-facie means its probably wrong :lol: -- is that you are faced with that transition hit no matter what you do, all you can control is timing. Tho admittedly we tend to throw the tiaras and brickbats around here based on how well you manage that timing as a company (in part, see NV30). But anyway, I look at most non-shader-limited games aimed at "traditional" architecture and I generally see modern cards kicking their ass big-time. So why not take part of the the hit now, when you can mask it with a major process move that should give you enough performance margin to at least not "take a step back" on the traditional-aimed games while you daintily start (admittedly, you can't complete the journey) your way across the bridge?

Edit: I should add, based on some comments upstream, I have pegged Dave this time around as somewhere between SWAG (Sophisticated, etc) and "shrewd guessing on partial info", rather than "veiled ex-Cathedra". This makes me marginally willing to argue the toss with him this time --politely, of course. :D

PeterAce
14-Apr-2005, 00:55
This is hardly a game:

http://graphics.stanford.edu/~yoel/notes/ - 21 February

but it shows problems. I believe he's using the technique Humus came up with in his early-Z demo (what's its name?) in order to solve the problem he's having with NV dynamic branching performance.

SC:CT apparantly uses dynamic branching in a fairly global shader, but I'm pretty vague on that. Plainly performance there isn't a problem (SM3 mode with the same eye-candy as SM1 mode runs 5-8% faster I think).

So I think it's unfair to generalise about NV40's dynamic branching performance as there are scenarios where it works without a hitch. I just get the feeling developer's hands are somewhat tied by GPU architecture gotchas. But aren't they always?

Jawed

I was under the impression (from Demirug's post below) that the SM3 path for SS:CT was only using static branching on NV4X:

http://www.beyond3d.com/forum/viewtopic.php?p=468009#468009

http://www.beyond3d.com/forum/viewtopic.php?p=468891#468891

Unknown Soldier
14-Apr-2005, 07:11
http://www.theinquirer.net/?article=22486

You'd think they are a 24hr cable news outlet the way they operate whether they have anything to say or not.

I believe TheInq is just covering their bases. The more info they put out .. even though how inaccurate it might be .. at least one might be right .. and then they'll shout that they said it first.

US

hoom
14-Apr-2005, 07:26
So R400 Forever architecture is waiting for the Windows Forever operating system and the killer app Duke Nukem Forever :?

tEd
14-Apr-2005, 08:12
I assume SC:CT is using DB for shadow map filtering/soft-shadows

jvd
14-Apr-2005, 08:14
So R400 Forever architecture is waiting for the Windows Forever operating system and the killer app Duke Nukem Forever :?

r400 is gone , its offspring the r500 will be seen later this year in the xbox 2 and a pc part the r600 will be released most likely next year in 2006

Demirug
14-Apr-2005, 08:52
This is hardly a game:

http://graphics.stanford.edu/~yoel/notes/ - 21 February

but it shows problems. I believe he's using the technique Humus came up with in his early-Z demo (what's its name?) in order to solve the problem he's having with NV dynamic branching performance.

SC:CT apparantly uses dynamic branching in a fairly global shader, but I'm pretty vague on that. Plainly performance there isn't a problem (SM3 mode with the same eye-candy as SM1 mode runs 5-8% faster I think).

So I think it's unfair to generalise about NV40's dynamic branching performance as there are scenarios where it works without a hitch. I just get the feeling developer's hands are somewhat tied by GPU architecture gotchas. But aren't they always?

Jawed

I was under the impression (from Demirug's post below) that the SM3 path for SS:CT was only using static branching on NV4X:

http://www.beyond3d.com/forum/viewtopic.php?p=468009#468009

http://www.beyond3d.com/forum/viewtopic.php?p=468891#468891

One of the shaders use dynamic branching. All the others use static branching only. Some use all 16 booleans.

PeterAce
14-Apr-2005, 10:02
Thanks for the info Demirug.

May I ask - which effect is dynamic branching shader being used for in SS:CT?

Demirug
14-Apr-2005, 10:06
Thanks for the info Demirug.

May I ask - which effect is dynamic branching shader being used for in SS:CT?

I am not sure. I believe it is used for the softshadows.

Megadrive1988
14-Apr-2005, 10:38
Why do people think whatever is in the Xenon will functionally be any lesser than whats in the R600?

Because R600 is a WGF2 part, and WGF2 hasn't been finalised (or at least won't have been finalised early enough for R500 in Xbox 360 aka Xenon).

R500 should be finished right now... prolly has been for a few months.

Jawed


I believe that Xenon GPU (R500) will do some things that R600 won't do, and R600 will do some things that Xenon GPU won't do. because Xenon's R500 will have things not in R600, and R600 will have things not in Xenon's R500. even though both GPUs are most likely based on the same architecture (re-worked, re-tooled and beefed-up R400)

not unlike NV2A and NV25. each had some things that the other did not have. NV2A had more shader ALUs (right DaveB?) and some geometry features that didn't show up on the PC side until NV30. and the NV25 had things that NV2A did not have like totally new anti-aliasing unit, AccuView, among other things.

Unknown Soldier
14-Apr-2005, 11:06
Another month and a half to go .. how depressing.

I see that the Multi-VPU may be talked about at WinHEC. (http://www.xbitlabs.com/news/video/display/20050412124323.html)

ATI's Multi-VPU Technology May be Touted at WinHEC

ATI's technology that would allow two or more graphics cards in a personal computer to render a single frame in 3D games in parallel thus increasing performance and quality may be discussed later this month during WinHEC show in Seattle, Washington.

“PCI Express is also returning the graphics subsystem to a general-purpose, highly scalable interface, which brings new opportunities to scale graphics performance by adding additional graphics cards to a system. Today, graphics industry leaders NVIDIA and ATI are offering graphics solutions that leverage the power of multiple GPUs in a single system,” reads a description of the session called “PCI Express: Spurring New Ideas in Graphics”.

“There is nothing new… We’ve been doing dual GPU for a long time. Right now there are E&S systems with 16 ATI GPUs in it,” an ATI spokesperson said when asked for comment.

US

hoom
14-Apr-2005, 18:58
:roll: Gah I hate when people don't get the joke :?

Pete
14-Apr-2005, 19:46
I assume SC:CT is using DB for shadow map filtering/soft-shadowsWhat does Dave Baumann have to do with CT's shadows?

Oh, riiiight....

ANova
14-Apr-2005, 21:07
Wow, over 61000 views and almost 1000 posts. And not a shred of fact.

Vegetto
14-Apr-2005, 21:10
Wow, over 61000 views and almost 1000 posts. And not a shred of fact. and that's something new? the same happened with NV40 :P, the only difference was, that topic had some info and it was right :P

Pete
14-Apr-2005, 21:17
Heh, how about a repeat performance, Veg? :)

Vegetto
14-Apr-2005, 21:22
Heh, how about a repeat performance, Veg? :) :oops: i don't get it, are u being mean? :oops:

DegustatoR
14-Apr-2005, 21:24
There is some info that is right in this topic 8)

ANova
14-Apr-2005, 21:27
There is some info that is right in this topic 8)

If you consider a bunch of tech nerds arguing over speculation and rumors right then yes.

Vegetto
14-Apr-2005, 21:32
There is some info that is right in this topic 8) yeah is right that the R520 is a R300 pushed to the limits :lol:

Ostsol
14-Apr-2005, 21:45
There is some info that is right in this topic 8) yeah is right that the R520 is a R300 pushed to the limits :lol:

I still don't understand why people say that. . . :roll:

Dave Baumann
14-Apr-2005, 22:06
Actually, I've heard people from ATI refer to it as such albeit glibly; however, it should mark a radical development of it. Personally I look at the R300 architecture lineage as the DX9 architecture and they will move to a unified architecture for WGF2.0.

Vegetto
14-Apr-2005, 22:09
Actually, I've heard people from ATI refer to it as such albeit glibly; however, it should mark a radical development of it. Personally I look at the R300 architecture lineage as the DX9 architecture and they will move to a unified architecture for WGF2.0. That was what i mean :)

Pete
15-Apr-2005, 00:33
Heh, how about a repeat performance, Veg? :) :oops: i don't get it, are u being mean? :oops:Not at all! You brought us early NV40 numbers, didn't you (the ones with the [at-the-time] improbably high single-textured fillrate)?

BTW, what happened to the EX in your name? Or maybe I'm confused about that, too.

Geo
15-Apr-2005, 01:12
Actually, I've heard people from ATI refer to it as such albeit glibly; however, it should mark a radical development of it. Personally I look at the R300 architecture lineage as the DX9 architecture and they will move to a unified architecture for WGF2.0.

Ah, a little tease that makes everyone happy! :lol:

Geo
15-Apr-2005, 01:16
Heh, how about a repeat performance, Veg? :)

See, and some of you guys thot that starting an NV Next thread wouldn't go anywhere! :) Now we just need to lure him over there to post and not just read (and no doubt laugh). . .

Edit: Whupsie, he already did! Line 'em up, boys, "improved branching" for the House! :D

Tridam
15-Apr-2005, 04:09
Thanks for the info Demirug.

May I ask - which effect is dynamic branching shader being used for in SS:CT?

I am not sure. I believe it is used for the softshadows.

Yes it's used for the softshadows (the coherency is good enough in that case for a performance gain). It's the only dynamic branching used in the game.

Chalnoth
15-Apr-2005, 04:43
One other major scenario that might use dynamic branching, limiting light calculation based upon distance (see Humus' stencil-based dynamic branching demo), should also have enough coherency for a performance improvement. If his demo were modified for PS3, it is unlikely that it would be as fast as the stencil-based technique, due to the low-poly nature of the scene, but it should still be better than no dynamic branching.

Pete
15-Apr-2005, 06:17
See, and some of you guys thot that starting an NV Next thread wouldn't go anywhere! :) Now we just need to lure him over there to post and not just read (and no doubt laugh). . .

Edit: Whupsie, he already did! Line 'em up, boys, "improved branching" for the House! :D:)

Unknown Soldier
15-Apr-2005, 06:21
Wow, over 61000 views and almost 1000 posts. And not a shred of fact.

Hmm Fact. The PCB is RED

Sorry Digi .. took your line.

US

Vegetto
15-Apr-2005, 06:22
Heh, how about a repeat performance, Veg? :) :oops: i don't get it, are u being mean? :oops:Not at all! You brought us early NV40 numbers, didn't you (the ones with the [at-the-time] improbably high single-textured fillrate)?

BTW, what happened to the EX in your name? Or maybe I'm confused about that, too. About the EX well my account died... because i didn't post in a long time, reading is so much funnier :oops:

Anyway i have been busy playing with some toys and trying not to fry something ( like an epox motherboard :cry: )

let's wait a few weeks, it's not that hard. And plz god make may come faster


:twisted:

hovz
15-Apr-2005, 06:29
nearing 50 pages, no useful info. i love this... :roll:

dizietsma
15-Apr-2005, 06:51
Heh, how about a repeat performance, Veg? :) :oops: i don't get it, are u being mean? :oops:Not at all! You brought us early NV40 numbers, didn't you (the ones with the [at-the-time] improbably high single-textured fillrate)?

.

I particularily remember the 3dmark2001 number of above 36k ... wishful thinking I'm afraid.

neliz
15-Apr-2005, 08:07
I particularily remember the 3dmark2001 number of above 36k ... wishful thinking I'm afraid.

There was allready a foto of it on techreport, running 3dmark01 and scoring close to 50k... :)

Geo
15-Apr-2005, 10:02
nearing 50 pages, no useful info. i love this... :roll:

Yes, but it would only be 45 pages without these. 8)

CMAN
16-Apr-2005, 03:37
I don't remember seeing it before, but what do people think they are going to call the card? Radion XI800? I just can't think of how they are going to increment without sounding really silly. Any ideas?

Pete
16-Apr-2005, 04:01
Moving to SM3 is probably as good a time as any to revamp the naming convention, unified shaders or not.

Chalnoth
16-Apr-2005, 04:39
Some have suggested Radeon Z800 and family. Makes sense to me.

Geo
16-Apr-2005, 06:06
Some have suggested Radeon Z800 and family. Makes sense to me.

There are already a couple Z800 tech devices (speakers and a phone) around.

Chalnoth
16-Apr-2005, 06:09
Sure, but as a result of, if I remember correctly, a lawsuit between Intel and AMD (back when AMD was basically reverse engineering Intel's 486 and previous processors), a number isn't a valid registered trademark. Thus Intel started naming their processors.

ANova
16-Apr-2005, 07:19
Could just be the X900.

Fodder
16-Apr-2005, 08:09
That would be even worse than calling the X800 the 9900.

silence
16-Apr-2005, 13:27
Could just be the X900.

X11xx? makes some sense, more then X900.

EDIT :: i meant X1xxx series.... :oops:

pc999
16-Apr-2005, 14:03
XI800

tEd
16-Apr-2005, 14:24
Radeon Rampage insert number you like :lol:

Geo
16-Apr-2005, 19:10
There's a relatively newbie Canada-based fellow over at R3D who seems flatly confident of a May 10th release date. He's also talking about 24 pipes as good as 32 and adds that little touch of detail that makes these out-of-left-field pronuncios just credible-looking enough to wonder about, "Marketing will dub it a 24 pipeline stage."

So, looking at ATI's calendar, it appears they have one US based and one UK based conference going on simultaneously on May 10. That pattern look familiar to anyone? :)

However, pipelines aside, what about the release date?

The UK conference appears to be HDTV focused ("Mediacast UK"), but would still serve as a cover to have senior-type personnel in town to have a side announcement outside the conference itself of R520. Going a couple days before the Xbox 2 announcement on May 12 might even be a concession to MS, rather than stepping on their attention by going shortly after. And if R520 is significantly "less exciting" than R500, ATI might see advantage as well in "building" the announcements rather than appearing to "go backwards" by following exciting with less-exciting.

The one in the US is something called "BE" in Baltimore. Doesn't look particularly techie. Having visited their site www.be.org I'm still not exactly sure what the heck it is.

So, credible or not?

Chalnoth
16-Apr-2005, 19:21
From what I understand, ATI is expected to release an entire range of products based on the R520, so an HDTV-centric conference may make sense (for the low-end products).[/img]

neliz
16-Apr-2005, 20:09
Over at VR-zone they also say that the RD400 (AMR Ready) is being released mid-may..

http://www.vr-zone.com/?i=2009&s=1

that does make sense with the may 10th date mentioned HERE and over at R3d...

trinibwoy
17-Apr-2005, 00:56
Over at VR-zone they also say that the RD400 (AMR Ready) is being released mid-may..

http://www.vr-zone.com/?i=2009&s=1

that does make sense with the may 10th date mentioned HERE and over at R3d...

I guess that answers the question of whether an ATI chipset is required for AMR. If that's the case then we're going to move from choosing ATI/Nvidia cards to ATI/Nvidia systems....

Jawed
17-Apr-2005, 01:28
Over at VR-zone they also say that the RD400 (AMR Ready) is being released mid-may..

http://www.vr-zone.com/?i=2009&s=1

that does make sense with the may 10th date mentioned HERE and over at R3d...

I guess that answers the question of whether an ATI chipset is required for AMR. If that's the case then we're going to move from choosing ATI/Nvidia cards to ATI/Nvidia systems....

I don't think that's necessarily so, since one of the possibilities for MVP has been described as "software only", merely requiring the mobo has two or more PEGx16 compatible slots (e.g. one slot might be only 4 lanes, but has an open end so that a PEGx16 board can fit).

Jawed

Dave Baumann
17-Apr-2005, 12:03
I guess that answers the question of whether an ATI chipset is required for AMR. If that's the case then we're going to move from choosing ATI/Nvidia cards to ATI/Nvidia systems....

From what I've heard so far, I don't think that is the case. Hopefully the opposite will happen and ATI's will actually work on other platforms as well, which could promote more "certification" for other platforms on both sides.

DegustatoR
17-Apr-2005, 12:20
I guess that answers the question of whether an ATI chipset is required for AMR. If that's the case then we're going to move from choosing ATI/Nvidia cards to ATI/Nvidia systems....
The only thing preventing SLI from working on any dual-PCIE x16 slots boards is current ForceWare drivers.

trinibwoy
17-Apr-2005, 13:00
I like the optimism :D At least May 10 isn't so far away.

Dave Baumann
17-Apr-2005, 13:30
If you think about it, ATI doesn't have anywhere near the compelling presense as far as chipsets are concerned, so tying their crossboard/chip rendering scheme to their own platform may not be the best thing, at this point in time, in terms of adoption. So, if/when they fire it up it'll be better fo them to make sure it works on as wide a number of platforms as possible.

Geo
17-Apr-2005, 15:11
If you think about it, ATI doesn't have anywhere near the compelling presense as far as chipsets are concerned, so tying their crossboard/chip rendering scheme to their own platform may not be the best thing, at this point in time, in terms of adoption. So, if/when they fire it up it'll be better fo them to make sure it works on as wide a number of platforms as possible.

If this is a hint, than Yay! If it is pure analysis, then I suppose that depends on whether the primary goal is to sell more cards or to use the lure of SLI/AMR/MVP to sell more chipsets. We've already seen with NV some behavior meant to force SLI users to a particular, more expensive NV chipset.

Tho given the relative realities, I would think there would be some attractiveness to sticking a thumb in NV's eye a bit by denying them some card sales to go along with the chipset sale. OTOH, the "ATI is coming!" on this front has been building for quite some time --are we sure that NV hasn't put an anti-ATI poison pill in their SLI chipset? Being relatively newish and low-volume, that kind of behavior wouldn't surprise me --tho if it becomes a significant chunk of the market over time I'd expect market pressures from the builders to eventually force them to play nice together.

The reverse --whether NV SLI will work on ATI chipset boards will also be interesting --do we know that yet?

Geo
17-Apr-2005, 15:21
Son of a gun. A bad joke got out of hand?

http://www.theinquirer.net/?article=22582

tEd
17-Apr-2005, 15:26
So what you think about the kind of cooling solution it will have? Single slot , dual slot , triple slot even? What about the length more x800pro kind of length or gf6800GT/Ultra ish.

Ailuros
17-Apr-2005, 16:04
So what you think about the kind of cooling solution it will have? Single slot , dual slot , triple slot even? What about the length more x800pro kind of length or gf6800GT/Ultra ish.

I personally doubt that it'll turn out any bigger than the current R4xx boards, could be wrong though.

MuFu
17-Apr-2005, 16:24
Son of a gun. A bad joke got out of hand?

http://www.theinquirer.net/?article=22582

Dunno. Sounds like quite a clever way to try and minimise leaks if you ask me.

So what you think about the kind of cooling solution it will have? Single slot , dual slot , triple slot even? What about the length more x800pro kind of length or gf6800GT/Ultra ish.

I personally doubt that it'll turn out any bigger than the current R4xx boards, could be wrong though.

Well we have already heard that the layout of the CeBit boards was quite similar to the NV40 ref design. I would assume the size is similar too.

Blazkowicz
17-Apr-2005, 16:24
Could just be the X900.

X11xx? makes some sense, more then X900.

EDIT :: i meant X1xxx series.... :oops:



I think it could be X1800, or simply 11800. You lose some "X factor" then, but it's still here with XT and XL

XI800 is too confusing I think :wink: , excessive mixing of roman and arab numbers.

digitalwanderer
17-Apr-2005, 17:38
Son of a gun. A bad joke got out of hand?

http://www.theinquirer.net/?article=22582
I don't know, but I do know I want one of them shirts now.... 8)

Reverend
17-Apr-2005, 17:52
Seriously, other more important-to-site threads have been deleted and this one is still here?

Reverend
17-Apr-2005, 17:54
Wait, I meant to type "locked" and not "deleted".

Jawed
17-Apr-2005, 17:58
Look on page 49 for a measure of this thread's importance to this site.

Jawed

p.s. I saw the first version of your rant...

Bouncing Zabaglione Bros.
17-Apr-2005, 18:16
Seriously, other more important-to-site threads have been deleted and this one is still here?

Nothing wrong with keeping all the rampant speculation in one place. Otherwise people just spawn off similar threads all over.

digitalwanderer
17-Apr-2005, 18:29
Seriously, other more important-to-site threads have been deleted and this one is still here?
Aparently. Don't it suck when the mods/admins do something you don't like and without explanation? ;)

tEd
17-Apr-2005, 18:41
Well we have already heard that the layout of the CeBit boards was quite similar to the NV40 ref design. I would assume the size is similar too.

Well then it won't fit in my machine :x

Doesn't matter no money for upgrading anyway :|

tEd
17-Apr-2005, 19:16
http://www.theinquirer.net/?article=22581

yes the inq but the words come directly from dave ortons mouth

ATI will go from 90 to 80nm

DegustatoR
17-Apr-2005, 19:28
ATI will go from 90 to 80nm
Good for them! How 'bout goin' from 130nm to 90nm first?

tEd
17-Apr-2005, 19:30
ATI will go from 90 to 80nm
Good for them! How 'bout goin' from 130nm to 90nm first?

Only of you read the article

Geo
17-Apr-2005, 19:45
Seriously, other more important-to-site threads have been deleted and this one is still here?

http://www.fotosearch.com/PHD213/42030/

Dave Baumann
17-Apr-2005, 19:53
90nm and 65nm are major nodes. 80nm is likely just an optical shrink of 90nm, so it probably won't be any great shakes moving to it following 90nm.

tEd
17-Apr-2005, 19:55
90nm and 65nm are major nodes. 80nm is likely just an optical shrink of 90nm, so it probably won't be any great shakes moving to it following 90nm.

Will it be lowk too or like 110nm

Jawed
17-Apr-2005, 19:56
The question is, why would ATI even be making noises about 80nm right now?

Jawed

Tim Murray
17-Apr-2005, 20:26
90nm and 65nm are major nodes. 80nm is likely just an optical shrink of 90nm, so it probably won't be any great shakes moving to it following 90nm.

Will it be lowk too or like 110nm
if it's an optical shrink of 90nm and all 90nm at TSMC is low-k, then yes, I would imagine that 80nm is also low-k.