eight shader units for R520

Mark · Dec 10, 2004

DaveBaumann said:
Unified pipelines from ATI will exist before WGF, but not on the PC.

Damn Dave that's pretty cut and dry, especially coming from you. What happened to "Yes"?

Hyp-X · Dec 10, 2004

digitalwanderer said:
Current rumors are saying that the R520 will be to the R420 what the R420 was to the R300, 'cept maybe a little more.

Unlikely.

I think the performance increase won't be that large, but there will be more features (FP32 shaders, FP16 filtering & blending, checkbox SM3.0 support).

It should be pretty much the same as NV47 (no DST though, unlike in R500).

{Sniping}Waste · Dec 10, 2004

DaveBaumann said:
Unified pipelines from ATI will exist before WGF, but not on the PC.

Looks like the XBOX2 is going to be the test bed for unified pipelines then. So if it work great then the PC would see it with R580 as a refresh R520 or R6XX.

Dave Baumann · Dec 10, 2004

No, since it will be a significantly different architecture. What this should tell you is there is a difference in emphasis and features between R520 and the XBox.

The reasons for not doing it at the moment is that under SM3.0, IIRC, not all shader functionality is the same yet, and certainly the usageage patterns are nowhere near similar so the argument for implementing it now isn't as great. Conversly, the XBox 2 is a closed box environment where the API can be tailored towards the specifics of the hardware and developer evangelism can be done in such a way to maximise the usage of that hardware - the potential payoffs (for a similar sized chip) can be greater in this environment immediately. When SM4.0 is about, which will have the same functionality across both VS and PS the argument for the PC becomes much greater (and you'll also have process improvements to pack more processing units in).

991060 · Dec 10, 2004

Dave, when talking about SM4.0 stuff, are you sure about everything? I mean that we all know you have better intel than most of us, can you tell us whether SM4.0 or WGF2.0 is finalized yet, if no more info is allowed because of NDA.

Demirug · Dec 10, 2004

991060 said:
Dave, when talking about SM4.0 stuff, are you sure about everything? I mean that we all know you have better intel than most of us, can you tell us whether SM4.0 or WGF2.0 is finalized yet, if no more info is allowed because of NDA.

I am not Dave but the SM for WGF is very simple.

The Interface is HLSL an there are no resource limits. Every shader have to run.

kemosabe · Dec 10, 2004

What assumptions support the theory that 24 pipelines would be the ceiling for R520? Curious to hear the opinions of the more technically schooled.

loekf2 · Dec 10, 2004

{Sniping}Waste said:
DaveBaumann said:

Unified pipelines from ATI will exist before WGF, but not on the PC.

Click to expand...

Looks like the XBOX2 is going to be the test bed for unified pipelines then. So if it work great then the PC would see it with R580 as a refresh R520 or R6XX.

Not on XBOX2 aka Xenon I think. If the XBOX2's GPU is called R500... I expect on the IC architecture level some similarities between the two...

Can the XBOX2 GPU be a R520 with less pipes, but with SM3.0 ?

What if unified shaders pop up in this Nintendo "thingie" ATI is also cooking up somewhere ?

Dave Baumann · Dec 10, 2004

kemosabe said:
What assumptions support the theory that 24 pipelines would be the ceiling for R520? Curious to hear the opinions of the more technically schooled.

Given the GS report that suggests die size will decrease...

loekf2 said:
Not on XBOX2 aka Xenon I think. If the XBOX2's GPU is called R500... I expect on the IC architecture level some similarities between the two...

Probably a poor assumption this on this occasion (read my prior post).

991060 · Dec 10, 2004

Demirug said:
I am not Dave but the SM for WGF is very simple.

The Interface is HLSL an there are no resource limits. Every shader have to run.

Well, thats still too abstract to me, how about topology ability in vertex shader, integer instructions, single pass cube map rendering, and the most I'm interested: a more flexible I/O as promised in WinHEC presentations.

I know I'm asking for too much, but if you know, please tell me.

karlotta · Dec 10, 2004

loekf2 said:
Not on XBOX2 aka Xenon I think. If the XBOX2's GPU is called R500... I expect on the IC architecture level some similarities between the two...

Can the XBOX2 GPU be a R520 with less pipes, but with SM3.0 ?...

Xbox 2 is a R500. diferent chip then a R520.

Entropy · Dec 10, 2004

kemosabe said:
What assumptions support the theory that 24 pipelines would be the ceiling for R520? Curious to hear the opinions of the more technically schooled.

Die area/yield.
Assuming that ATI will use TSMCs new 90nm line, they will want decent yields. As you go from 130nm to 90nm, you would in theory get twice the number of components into the same area. Now, there is quite a bit of area on a R480 that doesn't have to change when increasing parallellism (such as driving off-chip I/O, 2D-stuff, video stuff, et cetera) on the other hand these don't always scale very well with litographic shrinks anyway. Call this chip area constant between lithographic generations , for arguments sake. The reason we can make a fairly substantial rounding error for these, is that for a top of the line chip the bulk of the die area is in all probability devoted to the vertex and pixel shaders, and the associated circuitry to keep these fed.

So why won't these simply double?
Two reasons - the first is that you might want to add features and presumably increase precision. These changes cost gates. Enough that Dave Orton cited that as the reason why ATI didn't add such features at 130nm. Going to 90nm will presumably allow it, but will still claim significant die area leaving less for adding shader resources.
The other problem is power draw. TSMC utilises neither SOI or strained silicon, so they don't look too well positioned as far as battling leakage losses goes. So if they just doubled the amount of transistors, the power draw would increase dramatically. Not good, when you're already pushing the envelope in terms of power/cooling/noise/cost. The logical solution is to drop voltages, accept lower frequencies, and let increased parallellism bail you out in terms of performance. So ATI is almost sure to add pipes, since this is the best way they can ensure that they will get reasonable performance increases to help sell the product - necessary to justify high prices before content starts to demand the additional features. 24 pipes is a 50% increase - no factor of two, but still respectable, and they won't have to endure a memory bandwidth starvation that is much worse than current levels.

That's my take - Yield/cost/power draw/memory tech conspire to make 24 pipes a reasonable guess for a 90nm high-end part from ATI with some advances in feature set and precision.
32 isn't impossible, but would imply increased die area and cost. The risk involved in moving to a new lithographic process would be increased further. And memory bandwidth issues would decrease efficiency.
16 is less likely due to marketing reasons. nVidia already offer SM3 and higher precision and will have had time to tweak their designs for yield and low cost. ATI will have to offer a significant performance delta. Also, the new part has to compete successfully performance wise with the 850XT, since the new features won't be all that significant in terms of beign necessary for games for a long time yet. 16 pipes where the additional transistors are spent on features, and MHz gains are negligeable due to leakage, just won't look all that compelling. But the rumours about decreased die area makes it difficult to rule out 16 pipes completely.

DegustatoR · Dec 10, 2004

Great analysis

But i thought that TSMCs 0.09 has at least low-k (IBM's has SOI for sure)?

Entropy · Dec 10, 2004

DegustatoR said:
Great analysis But i thought that TSMCs 0.09 has at least low-k (IBM's has SOI for sure)?

Yes, TSMCs 90nm process will feature low-k, but then so do the 130 nm process ATI already uses. To my knowledge, ATI does not utilize IBM for fabbing, and I heve heard no rumors to that effect for the 90nm node. nVidia might though. IBM does have a process edge vs. TSMC at 90nm.

MuFu · Dec 10, 2004

DegustatoR said:
chavvdarrr said:

so... it looks like 24 pixel & 8 vertex pipes

Click to expand...

That's NV47

Do you know whether they have kept a 1:1 PS:ROP mapping? Something like 6 PS + 4 ROP quads might make sense from a die size point of view (I'm asssuming there are appreciable savings to be had, of course). It seems to have worked very well for them in NV43.

Dave Baumann · Dec 10, 2004

I would suggest that you'd need to consider why the discussion for 24 pipelines is interesting; what does it actually gain, especially when you figure in memory techologies - pixel fill-rate is hardly an issue now, is it? So you would only be looking to augment shader performance. Then consider the process adoption differences - why would there be a need to make more pipelines?

Geo · Dec 10, 2004

They are gonna take a pretty significant wallop to the tranny count on FP32 as it is. Plus more shader power, plus rumored flow-control to make branching performance actually acceptable. I wouldn't be the least bit surprised if R520 high-end was only 20 pipes given all the rest. Nor disappointed, so long as clock keeps going up.

Megadrive1988 · Dec 11, 2004

*R520 has VS & PS 3.0 or 3.0+

*R500 (Xbox2 aka Xenon) should have unified shaders and be somewhere beyond VS & PS 3.0 (3.0++ or 3.0+++) but not all the way to SM4.0 / WGF2.0 / DirectX10 / DirectX Next

*R600 / R6XX will be the first PC VPU from ATI that has unified shaders. VS & PS 4.0 or 4.0+ (aka SM4.0 right?) - WGF2.0 - DirectX10 aka DirectX Next.

R520 aka Fudo (not R500 / Xenon) for PC is probably going to have 24 pixel pipelines, maybe 32 but more likely 24 (like Nv47) and the 8 Vertex Shaders. hehe, I was hoping for 12 Vertex Shaders, which would boost R520 to over a billion verts/sec. the only way now, to reach that, assuming 8 Vertex Shaders, is clockspeed (Fast14?). but probably won't happen.

no doubt that R520 is gonna eat Nv47 for lunch, either way.

there was an article suggesting that ATI's implementation of SM3.0 will be superior to that of Nvidia's in NV40 (not taking into account Nv47's advancements if any)

okay I should stop talking out of my arse now, about things I have no idea about. i'm probably correct on a few things, though 8)

Jawed · Dec 11, 2004

I think Dave's hinting that R520 will be topping out at something like 400MHz due to first-cut 90nm process limits (thinking of heat density constraints, which seem to have been a factor in the first generation of AMD A64s at 90nm - which mean that the fastest current A64s are still using 130nm).

To offset the slowdown will require 24+ PSs. And then there's the extra transistors required for FP32, a new memory controller (architecturally ready for 512-bit and GDDR4?) and the gubbins required to make dynamic branching work efficiently, rather than being a check-box feature.

Erm...

Second generation R520 could see a decent bump in core clock (say 500MHz), 32 PSs and a 512-bit memory interface. ATI might be looking at this level of integration on 90nm as a ceiling for a few years yet (65nm would seem to be about 3 years away?), so it's this basic transistor budget and clock rate that will be accommodating unified shaders and SM4 in time for WGF 2.

Just idle speculation...

Jawed

Megadrive1988 · Dec 11, 2004

hmmm, second generation R520.....you mean like a R580 or something higher than R520 but before R600. maybe. I highly doubt a 512-bit memory bus though.

maybe a boost to 10 or 12 Vertex Shaders and 32 pixel pipelines.

this would still be based on the now-old R3XX architecture. I doubt we will see much more than a R420 to R480 leap though. but perhaps you are right or partly right.

now, as for R500 in Xenon, assuming it has unified pipelines, I'd guess at no less than 24. perhaps as many as 48. probably 24 though. and 24 is 2 more than R420/R480 which has 22 (6vs + 16pp) but then Xenon would likely have less than R520 (if R520 has 8vs + 24pp). so I'm hoping for 48 unified shaders in Xenon. is that too much to ask for on 90 nm? Xenon VPU is likely to smash through the 300 million transistor barrier, at least, specially with 10 or more MB of embedded memory.....

crap, can't think anymore. gotta goto work.

eight shader units for R520

Mark

aka Ratchet

Hyp-X

Irregular

{Sniping}Waste

Dave Baumann

Gamerscore Wh...

991060

Demirug

kemosabe

loekf2

Dave Baumann

Gamerscore Wh...

991060

karlotta

pifft

Entropy

DegustatoR

Entropy

MuFu

Chief Spastic Baboon

Dave Baumann

Gamerscore Wh...

Geo

Mostly Harmless

Megadrive1988

Jawed

Megadrive1988

Similar threads