Will R580 have 48 shader 'pipes?'

Will R580 have 48 shader fragment 'pipes?'

  • Yes, just like RV530 has 12.

    Votes: 86 60.6%
  • No way, it's just too many trannies.

    Votes: 56 39.4%

  • Total voters
    142
Rys said:
The pixel 'pipeline' isn't different. R580 is, pretty much, just wider. Easiest to think about it that way, I find. Which in turn leads to much more ALU ops per pixel thread because of that, which is RV530 and R580's main ethos. Get ALU:TEX up really high.
But I don't intend for the debate to be about "pixel pipelines." The speculation is over "pixel shader processors," which I assume can be more than one per "pixel pipe," and can have more than one ALU each, I suppose.
The leaked ATI slides show 12 pixel shader processors for RV530, and many people subsequently say that RV530 has 12 "pipes."
I assume, Rys, that you would still say that in fact it doesn't (you said as much in a previous thread). But then how do so many people get away with saying RV530 has 12 PS "pipes?" Surely that claim could not be made for just more ALUs?
So what exactly are you saying?

Are you saying R580 will have 48 ALUs, but they will be bundled into 16 "wide" PS processors?
Thanks. :)
 
ERK said:
Are you saying R580 will have 48 ALUs, but they will be bundled into 16 "wide" PS processors?
Thanks. :)
Yep, that's it. You can push more ALU ops (3x) through a 580 pixel processor than you can on 520. Pixel processor count stays the same.
 
Ground...

back in the day we had instrustor lead classes to help our engineers deal with the issues on mobile products...one phrase that always stuck is "TANG" which loosely translated to There Ani't No Ground. The idea was is that a lot of fresh outs would see these ground symbol on a schmatic and not realize that for a mobile device there is no mysitcal ground.. as all of those nodes go right back to the power supply and can give you some nasty current loops to deal with.... so yes having a good "ground" is important :)
 
Rys said:
The main problems were electrical. That the chip is big, or complex, really isn't the issue, although it is related. It's essentially the endeavour to get good electrical signal quality, a solid ground plane and predictable current flow into and out of the chip. That's always the case no matter what the size or transistor count (which is such an asshat metric for me).
But why did they have grounding problems?
 
ERK said:
But I don't intend for the debate to be about "pixel pipelines." The speculation is over "pixel shader processors," which I assume can be more than one per "pixel pipe," and can have more than one ALU each, I suppose.
The leaked ATI slides show 12 pixel shader processors for RV530, and many people subsequently say that RV530 has 12 "pipes."
I assume, Rys, that you would still say that in fact it doesn't (you said as much in a previous thread). But then how do so many people get away with saying RV530 has 12 PS "pipes?" Surely that claim could not be made for just more ALUs?
So what exactly are you saying?

Are you saying R580 will have 48 ALUs, but they will be bundled into 16 "wide" PS processors?
Thanks. :)

Albeit I think you understood it already, maybe those numbers help even more:

RV530 =

4 SIMD channels with 3 ALUs/channel (4*3=12)

R580 =

16 SIMD channels with 3 ALUs/channel (16*3=48)

For each SIMD channel 3 ALUs and thus the 3:1 ALU <-> TMU relation.

What I was trying to say in my former post is that since it's yet an unknown of how many floating point operations each ALU in R580 will be capable of, I wouldn't call them fragment shaders. If each ALU is capable of 2*FP32 operations/clock then we should be talking about 96 fragment shaders and not 48.
 
In terms of major nodes, 65. 80 is 90's optical shrink though, which is what's next in real terms. Both are correct in a sense.
 
Ailuros said:
Albeit I think you understood it already, maybe those numbers help even more:

RV530 =

4 SIMD channels with 3 ALUs/channel (4*3=12)
RV530 is 3 quads of pixel processors.

I'm expecting each shader quad to be separately schedulable - i.e. three independent batches of fragments can be processed simultaneously.

(A fourth batch will be in the quad texture pipes.)

Jawed
 
I'm really torn about this (guess I'll just have to wait until tomorrow ;) ). What Rys says is usually credible, of course, but what Jawed says makes sense too.

Wouldn't the wide pixel shader with 3 ALUs be a compiler nightmare?
In that light, I might just have to side with Jawed, still.

Hmmm...
Looking at the sell sheets again, it does say:

X1300 4 shader UNITS
X1600 12 shader PROCESSORS
X1800 16 shader UNITS

A significant distinction?
 
When has ATI ever released a refresh that is different than the original.
Add to the fact that they already have it in house, so its most likely

a) a faster R520
b) different process - 80nm

While it would be great - dont expect a new design.
 
Graphics_Krazy said:
When has ATI ever released a refresh that is different than the original.
Add to the fact that they already have it in house, so its most likely

a) a faster R520
b) different process - 80nm

While it would be great - dont expect a new design.
Do you like surprises? ;-)
 
Graphics_Krazy said:
When has ATI ever released a refresh that is different than the original.
Add to the fact that they already have it in house, so its most likely

a) a faster R520
b) different process - 80nm

While it would be great - dont expect a new design.

What if I told you, you were 100% wrong? Its faster, but its not a R520. And no, its not a different process, that's the R590, which is going to be used for cheaper production, and not most likely speed increase.
 
ERK said:
Looking at the sell sheets again, it does say:

X1300 4 shader UNITS
X1600 12 shader PROCESSORS
X1800 16 shader UNITS

A significant distinction?

If these sheets are legit, I would say so. Surely they would be prepared with carefully chosen words by marketing. There would be no room for creative wording.

It also makes sense. I would say that a unit can have several processors, but a processor cannot be several units. In the case of the X1300 and the X1800 there is one unit per pipeline. The X1600 has 3 independent processors per pipeline. If the X1600 was labeled as units it would be only 4. Of course they want to emphasize the number in this case, so they state the number of (independent) processors instead.
 
ERK said:
Looking at the sell sheets again, it does say:

X1300 4 shader UNITS
X1600 12 shader PROCESSORS
X1800 16 shader UNITS

A significant distinction?
Now that strikes me as a nice find!

One issue I'm struggling with, right now, is how the scheduling of shader and texture pipes works out.

In both R520 and RV515 the number of shader pipes and texture pipes is the same - 16 in R520 and 4 in RV515. So scheduling is "easy".

A set of 16 fragments in R520 is either in the shader or texture pipes. And in RV515, a set of only 4 fragments is in one or the other.

(Obviously, other fragments will be queued, waiting their turn to be scheduled).

In RV530, with 12 shader pipes and 4 texture pipes, it's not possible for a set of 12 fragments to switch from being shaded to being textured - there's only room for 4 fragments in the texture pipes.

The way I see it, RV530 is 3-way MIMD across its shader cores. Sets of 4 fragments at a time, A, B and C are each separately running. When, B say needs to be textured, it goes back into the queue and waits its turn in the texture pipes.

So this leads to a distinction between units (implying a SIMD block across one or more quads of shaders - 4 quads in the case of R520) and processors, which would seem to be n-way MIMD at the base size for that architecture.

In RV530 the base size is 4 shader pipes, arranged as 3-way MIMD.

In R580, I'm guessing, the base size is 16 shader pipes, again arranged as 3-way MIMD.

Xenos is 3-way MIMD with a base size of 16 shader pipes. There are 16 texture pipes in Xenos.

In general it seems that the number of texture pipes defines the base size for the architecture. Multiples of that size are required.

In R520 and RV515 the multiplier is 1. In RV530 and R580 it's 3. The former pair are simpler to schedule, hence "units".

All speculation...

Jawed
 
So you don't think R580 could be 4-way MIMD for textures and 12-way for PS (keeping everything in quads)?
Seems like it would not really have to have the complexity of "MIMD" through, right (in the CPU, VLIW sense)? It's more like parallel threads, multi-cores.
It seems a smaller change to the scheduler (between R520 and R580) to keep the TEX/PS 'core' functionality the same, but just have more parallelism: ie 4 TEX cores and 12 PS cores, for 16 threads executing at once (on four pixels each). And batches may not have to enter into it so much.

(I'm about two levels too deep for myself right now, don't shoot me!):oops:
 
Jawed said:
In R520 and RV515 the multiplier is 1. In RV530 and R580 it's 3. The former pair are simpler to schedule, hence "units".

Wouldn't it make more sense for the R520's multiplier to be 2?

The GTX has 2 "shader processor" per pipeline and the R420 had 1 and a bit. With only one shader processor per pipeline in the R520, it wouldn't be significantly faster than the RV530 (16 vs 12) which doesn't seem much for a mainstream to high end card. Also, 3x the difference in R580 is a hell of a lot, 1.5x seems more realistic.

Or to put it another way, Xenos has 48 shader processors, the GTX has 48, the R580 is expected to have 48, but the R520 only has 16??
 
I'm suggesting that R520 and RV515 are boring SIMD - R520 is 16 pipes all running in a common shader state - the same program counter, essentially - on 16 fragments. RV515 only has 4 shader pipes, so we're used to the concept of 4 pipes SIMD. 16 pipes SIMD in R520 is a bit more surprising, though.

In RV530/R580 I'm expecting there to be 3 distinct shader states executing at the same time in the shader pipelines. So that's 3-way MIMD, with each group of pipes running 4-way SIMD (RV530) or 16-way SIMD (R580). In other words, in R580 there are 3 program counters, each of which controls "16 active" fragments at one time.

In all of these GPUs, the texture pipes are in a single SIMD group running a common shader state. R520/R580 can execute 16 texture operations simultaneously. RV515/530 execute 4 texture operations simultaneously.

I hope that clarifies what I meant :oops: Right now, I'm just guessing, though - trying to put it all together.

Jawed
 
pjbliverpool said:
Or to put it another way, Xenos has 48 shader processors, the GTX has 48, the R580 is expected to have 48, but the R520 only has 16??
Yep! R520 was supposed to hit back in May/June - before RV515 and RV530 (September?) and before R580 (December?)

It seems to me that R520 is a simple introduction of the ultra-threading concept to the PC space.

The scheduler in R520 only has to arbitrate between 2 batches:
  • a batch of fragments that needs the ALUs
  • a batch of fragments that needs the TMUs
In RV530/R580 the scheduler is more complex because it has to arbitrate between 4 batches:
  • 3 batches of fragments that each need the ALUs
  • a batch of fragments that needs the TMUs
Jawed
 
Most definately "YES"

From x1x00 review:
Regular participants of our forums may well have been aware of some reasonably obscure numbering schemes for many months that were used to describe parts of the performance characteristics, such as 16-1-1-1 for R520, 4-1-3-2 for RV530 and 4-1-1-1 for RV515, but until now the exact meaning of these weren't known. With the specifications for each of the chips we can now derive that the first number equates to the number of ROP's, the second the number of texture units per ROP, the third the number of "shader pipes" per ROP, and finally the Z/Stencil multiplier per ROP - with these figures in mind, we'll let you consider the ramifications for parts in the R5xx series that are still to come...

Page 7, bottom paragraph.
 
Back
Top