AMD: R7xx Speculation

Sound_Card · Mar 25, 2008

Razor1 said:
well still have to have transistors for routing data in the larger processes, I remember the g70 to g71 shrink that was one place where nV shaved off some transistors, don't know how many though but it could be significant.

That was a mere 24 million transistors(and in todays GPU's means a lot less) coming from a full node step downwards(110nm ---> 90nm)mainly shortening the pipeline because of the higher clocks 90nm allows(faster switching). So I hardly think a half step(65nm---> 55nm) will yield the same regardless if the same to what happened to G71 is even applicable to G94 to begin with.

Hmm looking at the figures now, I am willing to bet the control logic in the g8x and g9x chips are significantly more then the AMD counterparts?

And control logic is somehow not counted in the transistor budget? Which brings me right back to my question that I asked Arun, is their some sort of differences between the actual transistor sizes them selves(theoretically speaking if G94 was on 55nm, but the comparison does not have to be just RV670 and G94) or does chip layout play a significant role here?

Of course more TMU's and ROP's and more robust TMU's as well.

Twice the filtering and ?address units?, less samplers, same amount of ROPs(or pixel capability that is), not exactly sure what you mean by more "robust" TMU's either, and not exactly sure how that even fits in with the topic at all.

trinibwoy · Mar 26, 2008

Sound_Card said:
Twice the filtering and ?address units?, less samplers, same amount of ROPs(or pixel capability that is), not exactly sure what you mean by more "robust" TMU's either, and not exactly sure how that even fits in with the topic at all.

Less samplers?

Farhan · Mar 26, 2008

Razor1 said:
well still have to have transistors for routing data in the larger processes, I remember the g70 to g71 shrink that was one place where nV shaved off some transistors, don't know how many though but it could be significant.

It really depends. On smaller processes you probably spend more transistors on repeaters for wires.

Sound_Card said:
That was a mere 24 million transistors(and in todays GPU's means a lot less) coming from a full node step downwards(110nm ---> 90nm)mainly shortening the pipeline because of the higher clocks 90nm allows(faster switching). So I hardly think a half step(65nm---> 55nm) will yield the same regardless if the same to what happened to G71 is even applicable to G94 to begin with.

Why would it be less in today's GPUs? GPU clocks and functionality have been growing even with faster transistor switching speeds. Also the speed improvement from scaling is slowing down these days. In the G70->G71 case it could just be that they were very conservative with the G70 and less so with the G71. Or they could have engineered faster math circuits. Or some combination of the above. It's not just a function of the manufacturing process.

Sound_Card said:
And control logic is somehow not counted in the transistor budget? Which brings me right back to my question that I asked Arun, is their some sort of differences between the actual transistor sizes them selves(theoretically speaking if G94 was on 55nm, but the comparison does not have to be just RV670 and G94) or does chip layout play a significant role here?

The transistor density could certainly be different for different designs, and it could also be a choice for yield optimization. The smallest/densest layout may not have the best yields.

Sound_Card · Mar 26, 2008

trinibwoy said:
Less samplers?

FP32 texture sampling units. I'm not sure if they are coupled with the address units on G80 and which would be no more than 64 or if they even exist at all on G80!!! But in any case, R600/RV670 have 20 FP32 Texture samplers per texture block for a total of 80.

Farhan said:
It really depends. On smaller processes you probably spend more transistors on repeaters for wires.

Why would it be less in today's GPUs?

I was talking about the diminishing importance of 24 million transistors on a 500m chip as compared to a 304m chip and never mind the node differences!

Also the speed improvement from scaling is slowing down these days. In the G70->G71 case it could just be that they were very conservative with the G70 and less so with the G71. Or they could have engineered faster math circuits. Or some combination of the above. It's not just a function of the manufacturing process.

The transistor density could certainly be different for different designs, and it could also be a choice for yield optimization. The smallest/densest layout may not have the best yields.

Thanks, thats seems like a reasonable perspective and makes plenty sense.

trinibwoy · Mar 26, 2008

Sound_Card said:
FP32 texture sampling units. I'm not sure if they are coupled with the address units on G80 and which would be no more than 64 or if they even exist at all on G80!!! But in any case, R600/RV670 have 20 FP32 Texture samplers per texture block for a total of 80.

Those 80 samplers correspond to 80 texels per clock retrieved giving a total of 16 bilerps and 16 point samples. Now consider how many texels G80 retrieves per clock in order to produce 64 bilerps. So in terms of "samplers" G80 has far more than R600. Granted each sampling unit on R600 is a bit beefier as it does full speed FP16 but G80 more than makes up for that by having four times as many.

Sound_Card · Mar 26, 2008

trinibwoy said:
Those 80 samplers correspond to 80 texels per clock retrieved giving a total of 16 bilerps and 16 point samples. Now consider how many texels G80 retrieves per clock in order to produce 64 bilerps. So in terms of "samplers" G80 has far more than R600. Granted each sampling unit on R600 is a bit beefier as it does full speed FP16 but G80 more than makes up for that by having four times as many.

Thanks.. I was strangely confused.

Farhan · Mar 26, 2008

Sound_Card said:
I was talking about the diminishing importance of 24 million transistors on a 500m chip as compared to a 304m chip and never mind the node differences!

Ah, sorry, i misunderstood that.
If you look at it in terms of reducing pipeline stages however, a single pipeline stage in a that 500m gpu would probably have more transistors in it than the 300m chip.

Sound_Card · Mar 26, 2008

From Fudo...

http://www.fudzilla.com/index.php?option=com_content&task=view&id=6467&Itemid=65

I have feeling RV770 is going to be first to the market, followed by GT200, then R700. And if those June rumors are true for RV770 than we could have a pretty interesting 55nm G92/94 vs RV770 war.

Razor1 · Mar 26, 2008

Sound_Card said:
Thanks.. I was strangely confused.

sorry bout that the last sentence was like what was running around in my mind, wasn't too coherent

LordEC911 · Mar 28, 2008

Finalized RV770 specs?

Source- VrZone

I'm liking the clocks and like everyone else said, these seem a bit more realistic than the 800SP rumor.

ECH · Mar 28, 2008

we await final specs

Kaotik · Mar 28, 2008

Aren't those the EXACT same as posted ages ago by few sites?

w0mbat · Mar 28, 2008

Yes, they are. Here´s the original news post: http://www.nordichardware.com/news,7356.html

aca · Mar 28, 2008

Kaotik said:
Aren't those the EXACT same as posted ages ago by few sites?

yep.

itaru · Mar 28, 2008

Perhaps, it is a fake.

Jawed · Mar 28, 2008

I did a possible configuration based on this rumour back in February:

In the past I've described it as 12 SIMDs. I dislike this idea because that's a lot of control overhead and results in relatively coarse-grained redundancy (60 redundant ALU lanes as compared with 20 in RV670). Alternatively, I suppose, it's possible to implement it as 4 SIMDs - each set of 96 SPs sharing a program counter. That would have 20 redundant ALU lanes - but now the issue is the batch size of 96...

This arrangement is the same type as seen inside R580, where each SIMD is 3 quad ALUs (12 pipes) sharing a single TMU.

So, as a 4 SIMD design I'm not unhappy. Still a bit dubious about it being a 3:1 ALU:TEX ratio, though.

Jawed

whocares · Mar 28, 2008

Today rumour from chiphell says :

RV770 final specifications
480SP (RV670 320)
Framework used R600, 4D +1 D and D for every 96 (RV670 every 64 D)
32TMU (RV670 than doubled)
Frequency 800 ~ 900MHz, depending on the final outcome of TSMC volume production scheduled listing price (RV670 reference listed prices)
Finally tell you that the version of RV770-how do not think it is RV670 twins, the future price trend can also RV670 reference to the current series.

4D+1D looks like a Xenos core design . But other rumours reject all speculative RV770 specs so far .

fellix · Mar 28, 2008

Still a bit dubious about it being a 3:1 ALU:TEX ratio, though.

Well, I certainly won't complain for the doubled bilerp rate.

The million dol... euro question here is how the batch preprocessing is done, at the top level. Or may be, there will be two-level "distributed" design. ATi really loves round square based structures, here.

Jawed · Mar 28, 2008

whocares said:
Framework used R600, 4D +1 D and D for every 96 (RV670 every 64 D)

I interpret that last "D" as a reference to the sequencer, i.e. a sequencer controls groups of 96 SPs, while in R670 a sequencer controls groups of 64 SPs.

4D+1D looks like a Xenos core design .

The 1D in Xenos is a transcendental unit, unable to do MAD, with 2 instructions per clock. In R600 and all later GPUs it's 5D MAD with one lane also doing transcendental (and extra integer instructions), making upto 5 different instructions per clock.

Jawed

Jawed · Mar 28, 2008

fellix said:
The million dol... euro question here is how the batch preprocessing is done, at the top level. Or may be, there will be two-level "distributed" design. ATi really loves round square based structures, here.

What do you mean by batch preprocessing?

In R600 there's an interesting hierarchy of processors:

Code:

         Sequencer
         |   |   |
    ------   |   -------
    |        |         |
   ALU    Vertex    Texture

A shader program consists of Sequencer instructions, with some Sequencer instructions being calls to subroutines of type ALU, Vertex or Texture. So you can think of the "shader" as being a network of four types of programmable processor.

Jawed

AMD: R7xx Speculation

Sound_Card

trinibwoy

Meh

Farhan

Sound_Card

trinibwoy

Meh

Sound_Card

Farhan

Sound_Card

Razor1

LordEC911

ECH

Kaotik

Drunk Member

w0mbat

aca

itaru

Jawed

whocares

fellix

Jawed

Jawed

Similar threads