R600 how many pipes

turtle said:
First of all, I bet you'd like my screenname...Sorry, it's not for sale. ;)

I think R600 will be like many others have said, 32rops, 3 ALUS per rop, with 32tmu's and 64 shaders. In theory, that should be quite efficient.

While before the full disclosure of R520/R580's arch, that old rumour of R600's specs seemed odd (http://www.cdrinfo.com/Sections/News/Details.aspx?NewsId=14184) but now it seems possible, considering the freq that was possible with 90nm, where GDDR4 is heading, and how Xenos FWIU uses 3(4) alu's per rop (one for buffer so really 16x3) No doubt R600 will use a combination of R520/Xenos tech...So 64p/32t seems possible with 32 (32x3) Rops. It'd also put it at an interesting 2:1 ratio...being somewhere inbetween the current 1:1 of R520, and the future 3:1 (R580) and 3:2 of G70,(and G71?[32/24], G80 [48/32]?) ratio we've got going now.

Perhaps what I said doesn't make sense, I don't know all the logistics as well as the hardcore here, but from what I gather it seems possible...and could be damn powerful.

http://imagestore.ugbox.net/aview/e8593141860f7a69148dcb92c9559f8b

65nm
64 Shader pipelines (Vec4+Scalar)
32 TMU's
32 ROPs
128 Shader Operations per Cycle
800MHz Core (R580 625/695)
102.4 billion shader ops/sec (R580 - 166 billion)
512GFLOPs for the shaders (R580 - 553.8)
2 Billion triangles/sec
25.6 Gpixels/Gtexels/sec
256-bit 512MB 1.8GHz GDDR4 Memory
57.6 GB/sec Bandwidth (at 1.8GHz) (GDDR4 memory is over 2.0 GHz)
WGF2.0 Unified Shader
 
boltneck said:
So you think they are going to go from 16 to 32 ROPs in one generation?

Xenos is capable of HD with only 8 ROPs (with the help of its eDram). I would expect R600 to have 16 ROPs just like the current generation. The major changes will be how the rendering blocks will be arranged, and how data is moved/accessed between the blocks. R600 will differ from Xenos in a number of ways, but basically it will add a 4th rendering block, giving it 64 ALU's.


I don't think 64 ALUs are enough for a late 2006 highend GPU. the Xenos already has 64 ALUs, supposedly, for redundancy, with 48 being active. I'm leaning toward R600 having 96 ALUs, perhaps 128 ALUs, but 96 active. just a guess.

24 ROPs -- shouldn't GDDR4 provide enough bandwidth on 256-bit bus ?
 
turtle said:
I think R600 will be like many others have said, 32rops, 3 ALUS per rop, with 32tmu's and 64 shaders.
So what's the difference between your ALUs and your shaders?
 
Megadrive1988 said:
I don't think 64 ALUs are enough for a late 2006 highend GPU. the Xenos already has 64 ALUs, supposedly, for redundancy, with 48 being active. I'm leaning toward R600 having 96 ALUs, perhaps 128 ALUs, but 96 active. just a guess.

24 ROPs -- shouldn't GDDR4 provide enough bandwidth on 256-bit bus ?

Currently the top of the line (for ATi) is 16 ALU's. When R580 comes out it will be 48 ALU's.

A unified Shadier is going to have better performance by design because of better use of resources across the architecture. Couple that with another rendering Block of ALU's and it will likely be 50% faster than the R580.

The Transistor budget for 96 ALU's with 24 or 32 ROP's with the scheduling logic would be astronomical.
 
boltneck said:
Currently the top of the line (for ATi) is 16 ALU's. When R580 comes out it will be 48 ALU's.

of course if you compare it to unified design you also have to count the vertex alu's on r520 and r580.
 
Does anyone know how good for instance the r500 is doing dynamic branching compared to r520?

r580 should do a little worse in DB compared to r520
 
Megadrive1988 said:
I don't think 64 ALUs are enough for a late 2006 highend GPU. the Xenos already has 64 ALUs, supposedly, for redundancy, with 48 being active. I'm leaning toward R600 having 96 ALUs, perhaps 128 ALUs, but 96 active. just a guess.

24 ROPs -- shouldn't GDDR4 provide enough bandwidth on 256-bit bus ?
It seems to me people assume they know more about Xenos than they really do.
 
boltneck said:
A unified Shadier is going to have better performance by design because of better use of resources across the architecture. Couple that with another rendering Block of ALU's and it will likely be 50% faster than the R580.
That is true only when you count both VS and PS units. If R580 has 12 vertex shaders and 48 pixel shaders, then the unified architecture with 60 shader units will be faster, but if it only has 48 then it will be slower.

The Transistor budget for 96 ALU's with 24 or 32 ROP's with the scheduling logic would be astronomical.
Don't see any reason to expect much less than half a billion transistors. R520 is already 320M, and R580 will be near 400M. Xenos supposedly has 64 ALU's when counting disabled shaders, and is only 230M. I know there's a lot more to consider, but it's certainly possible.

I'm also hoping for 96 ALU's and 32 texture units. 16 ROP's are plenty for me, but it'd be nice to be able to render 32 pix/clock to a single channel texture. I would love FP32 filtering and alpha blending, even if at reduced speed. There are some techniques that could make use of that feature. The last thing I'd like is triangle setup rate of 2 per clock. The USA should blast through those clipped and culled triangles.

That's one helluva wish list, but I can dream. :cool:
 
Mintmaster said:
That is true only when you count both VS and PS units. If R580 has 12 vertex shaders and 48 pixel shaders, then the unified architecture with 60 shader units will be faster, but if it only has 48 then it will be slower.

Rather 8 VS or the so far leaked numbers are false. I don't know how you can judge anything just by the amount of units on anything. I'd prefer to know what each unit CAN do also.

Don't see any reason to expect much less than half a billion transistors....

Stop here for a moment. Before you go on encount the D3D10 requirements first (which seem to be extremely high from what they're saying) and then we could speculate about the amount of units. If the requirements should be really that high I wouldn't expect an as big leap in performance on D3D10 compared to the last high end DX9.0 GPUs.
 
Megadrive1988 said:
I don't think 64 ALUs are enough for a late 2006 highend GPU. the Xenos already has 64 ALUs, supposedly, for redundancy, with 48 being active. I'm leaning toward R600 having 96 ALUs, perhaps 128 ALUs, but 96 active. just a guess.

24 ROPs -- shouldn't GDDR4 provide enough bandwidth on 256-bit bus ?


as far as I know Xenos has 48 physical ALUs (not more) running up to 64 threads.
(think of SMT/hyperthreading)
 
If I had to guess, I would say that the number of ROPs won't increase much beyond 16, if at all. The fact is, the math required per pixel is going way up, to the point where if you have even 8 ALUs per ROP you'll be taking several clock cycles to handle even fairly basic shader operations.

I just think it's going to be a waste of space to increase ROPs - space that would be better served in increasing the size of on-chip caches, adding ALUs, adding TMUs, or whatever.
 
But there are rendering algorithms (or phases of rendering, such as a z-only prepass) that have 0-length pixel shaders (well, I expect it's simply an absence of a pixel shader) - fill-rate needs to be as fast as possible in this case.

ATI's "normal" rate ROPs (as seen in X1800XT) should be a dying breed. X1600XT's ROPs are much better - and consequently the argument for more ROPs is moot.

Sadly, X1900XT's ROPS appear to be like X1800XT's, not X1600XT's. Big sigh.

Jawed
 
Mintmaster said:
Are you saying this widely propogated assumption about redundancy is false?
You'd have to ask Ati about specifics, but there is not an extra simd. Disabling a 4th of your shaders would mean some awful yields. Granted Megadrive did say supposedly, but I've seen a few people state this recently. Now back to more interesting discussion. :smile:
 
If Xenos redundancy works on the basis of dropping one of those four units, then it's a loss of 8% of the die - while the units themselves cover, in total, about 32% of the die.

Jawed
 
Back
Top