AMD: R7xx Speculation

Status
Not open for further replies.
Ninjaprime: Could you explain, how R580 could carry 3-times more ALUs than R520, being only 18% larger? ;)

Well, in the "old days" before unified proccessing, Pixel Shaders were much smaller and less complex than other units, individually, right? So if the 16 PS that were in the R520 were only 9% of the total die area, and all they changed majorly in R580 was the PS, then it would be about right, wouldn't it?

Now though, with proccessing unified, they are directly scalar, each taking up x amount of die space. Sure, they are only part of the die space, with ROPs, TMUs, ect taking up space, but they are the largest part of the die. With rumored twice the TMU's, even if the 25% more shaders only added 15-20% or so more die space, its easy to see where you could hit 25% overall increased die size.
 
Now though, with proccessing unified, they are directly scalar, each taking up x amount of die space. Sure, they are only part of the die space, with ROPs, TMUs, ect taking up space, but they are the largest part of the die.

This is simply not correct. TMU and control logic + caches take the biggest space, and i.e. ATI ALu density seems to be much higher than nvidia's. This because each group of "5 SP" in reality is a vec 5 superscalar ALU, so in RV770 there should be 160 of these ALUs instead of 64 in RV670.
 
Man there are some monstrous numbers being thrown around here :oops:

I think its interesting that certain people were dropping hints way back that RV770 would be 'like a new R580 for ATI' & a bunch of these 800SP suggestions are literally 2.5* ALU/TMU of RV670, which is pretty close to the 3* ALUs upgrade R520 -> R580 o_O

But it just seems so crazy :runaway:

That said, its notable that R600/RV670 had big theoretical GFLOPS leads over G80/92 & those numbers for the new NV chip are a pretty big increase in width on G92 even if the clock speed is down.
 
Well, in the "old days" before unified proccessing, Pixel Shaders were much smaller and less complex than other units, individually, right?
I don't think so:

R5xx-like shader unit:

(vec3+scalar, vec3 + scalar)

[MAD | MAD | MAD] [MAD]
[ADD | ADD | ADD] [ADD]


R6xx-like shader unit:

(5x scalar)

[MAD] [MAD] [MAD] [MAD] [MAD]


Control logic is a lot more complex, not the ALUs itself. It seems they are pretty cheap.
 
Well, in the "old days" before unified proccessing, Pixel Shaders were much smaller and less complex than other units, individually, right? So if the 16 PS that were in the R520 were only 9% of the total die area, and all they changed majorly in R580 was the PS, then it would be about right, wouldn't it?

Now though, with proccessing unified, they are directly scalar, each taking up x amount of die space. Sure, they are only part of the die space, with ROPs, TMUs, ect taking up space, but they are the largest part of the die. With rumored twice the TMU's, even if the 25% more shaders only added 15-20% or so more die space, its easy to see where you could hit 25% overall increased die size.

I heard that the shader core in RV670 took only about 19% of the die-size area... i don't know if is true, but would mean that 64SU (320sp) take 38 mm^2 on the die-size, and that 96 additional SU take another 60 mm^2.. and that is the amount of the increase between RV670's and RV770's die-sizes..
 
This is simply not correct. TMU and control logic + caches take the biggest space, and i.e. ATI ALu density seems to be much higher than nvidia's. This because each group of "5 SP" in reality is a vec 5 superscalar ALU, so in RV770 there should be 160 of these ALUs instead of 64 in RV670.

So the ALU's in RV670 only take up 10% or less of the die space? That doesn't seem right... of course I dont think I've ever seen a bare die shot of any ATI cores, so I guess its possible.
 
ok if we assume the following

that R600 was around 420 mm sq on 80 nm
RV770 is 250 mm sq on 55nm
that the transistor size is exactly 1/2 from 80nm to 55 nm.
that RV770 has 160 shaders
lets assume that nothing else is changed

that would mean RV770 has something like 20% more transistor count/die size. ( yes i know that isn't exactly true) So if we where to say that RV 770 has 2.5 times the shader count while only taking up 20% more transistors/die, that would mean that the shaders on R600 only took up something like 8% of the die space. :oops:

But we also expect to see some MTU's on RV770 etc which would mean even less die space for shaders on R600.

if we go off what was posted above ( 276 mm sq ) you end up with a 31% die size/transistor increase.

Seeing that AF performance seems to hurt on r600(more then AA it seems), who wants to take a punt that something was broken in the MTU's (which take alot of die space) and thus contributed to its underwhelming performance? you could then extrapolate out that its Fixed on RV770 and to meet there shader to tex performance/ ratios they didn't have to add to much die space to MTU's. this would allow for packing in the shaders.

Alos if this was the case wouldn't they have to beef up the Z fill rate or they will quickly run into an AA wall?

its most likely i have no idea what im talking about but it still is interesting to think about....
 
I heard that the shader core in RV670 took only about 19% of the die-size area... i don't know if is true, but would mean that 64SU (320sp) take 38 mm^2 on the die-size, and that 96 additional SU take another 60 mm^2.. and that is the amount of the increase between RV670's and RV770's die-sizes..

That would seem to fit, if, and a big if, they didn't really add to anything else. Doesn't seem like a good idea, isn't RV670 TMU bound?
 
Of course, I'm assuming its the 255mm die size since it seems most common, but then it could be one of the larger sizes... I guess it really boils down to not enough info. :cry:
 
My problem with this ALU number is this. I'm assuming it's still 16 shaders / 80 ALUs per processor. So we're looking at 10 processors - a 150% increase in shading and control logic.

Now my question is....if there's a 150% increase in control + shading logic accompanied by a 25% increase in die size that means control+ALU in RV670 is taking up very little space. So what's taking up the rest?

In terms of R580 they only increased ALUs and control was untouched so batch size tripled as well. Of course batch size on RV770 might have gone up too or batches could run for less than 4 clocks which would offset any increases in width.
 
This is simply not correct. TMU and control logic + caches take the biggest space
You don't really know this. You could be correct that control logic could be quite large, but I've got some doubts about the TMU size being that big, and am pretty sure you're dead wrong about cache size using a large amount of die space (unless you also consider register file as cache, but I count that as part of the ALU). (Compared to cpus, gpu caches are still small - think about it intel fits 4MB on about 70 mm^2 on a 65nm node and rv670 only has 256KB L2 and 4x32KB L1 texture cache - sure it has other caches too (for ROPs for instance) but probably nowhere near 4MB in total.)
 
That's another thing...wouldn't RV770's global register file be much larger than RV670's to support that kind of increase in arithmetic? Like Jawed, I would be gobsmacked if this is confirmed but I'm still a bit hesitant.
 
Seeing that AF performance seems to hurt on r600(more then AA it seems), who wants to take a punt that something was broken in the MTU's (which take alot of die space) and thus contributed to its underwhelming performance?
I guess you meant TMUs? I've not yet even seen a reasonable idea how big they really are, but I highly doubt there was anything broken with them other than simply being not enough of them (rv670 has half the filtering capability (for int8) than 9600GT and only 1/4 that of 8800GT so you can't really expect good AF performance).
 
Another performance numbers :

20080607_52b9dabecf4ee472b1c9sCHFvBdoLGz9.jpg
The stand-out there is 8800U scoring 3065 in Vantage Extreme. That's quite a bit higher than the seemingly rumoured 2800 for HD4870, which would indicate that ATI is still far off being able to fully use ~124GB/s.

The indication is also that GTX 280 will end-up ~60%+ faster than 8800U. In this case that's a scaling that's better than the increase in bandwidth.

Jawed
 
w0mbat seems to be confirming that 800SP is correct ..

Let's assume that RV770 has 800 stream processors, but let's also assume that TMU's get bumped up to 40 and RBE's stay at 16.

Would this chip not have the potential to go toe-to-toe with GTX 280? Could it be possible that ATI, aiming to make a price/performance part, ends up actually having a high end solution?
 
Status
Not open for further replies.
Back
Top