Upcoming ATI Radeon GPUs (45/40nm)

w0mbat

Newcomer
What do you expect from the upcoming ATI GPU series? How do you think ATI will try to get the best out of their next (R(V)800?) GPU series? Will they use 45nm or even 40nm process? Will there be a RV7xx refresh or is the next step a complete new desing?

Just post your thoughts here :idea:


NordicHardware just postet that they expect the first 40nm GPUs in Q1 2009 which will be RV740 and RV870.
http://www.nordichardware.com/news,7946.html


My thoughts:
RV870 will be a RV770 refresh @ 40nm
~1.6 billion transistors
25 SIMDs
400 5D ALUs (2000SPs)
100 TMUs
16-24 ROPs (8z/clk)
256bit MC @ 5Gbps GDDR5
600-800MHz engineclk

Or less ALUs but a seperated shader clk.
 
Duh!
And they said "no more big chips"? :D
/any time-frame about 450mm wafers being scheduled for mass production?/

Anyway, I don't think a 40nm design would be a pure shrink, as 260 sq.mm mark is quite comfortable, even now. Maybe beefing TMUs (I want my single-cycle FP16 back), enlarge some shared-mem buffers (16K>32/64K) and most importantly - yet another clock rate bump at 850~900MHz mark for a reference.
Probably, the economy of scale will be again the main agenda, here.
 
Duh!
And they said "no more big chips"? :D
/any time-frame about 450mm wafers being scheduled for mass production?/
Best-case, in the eyes of Intel, TSMC, Toshiba, and Samsung, was something in the region of 2012.

In the eyes of the equipment and tool manufacturers, the later the better.
The 300mm wafer transition turned out to be not such a good thing for them.

The likely costs and extremely reduced market size mean that unless the big chip manufacturers start paying some serious cash to finance the effort, it won't happen for quite some time longer, though I don't know enough about the dynamics of the equipment industry to say how many years more.



As for RV870, the rumor said that R870 would have 2000 ALUs, which means RV870 would have 1000.
Given that RV770 already has 800, that's a relatively modest increase despite the jump from 55nm to 40nm.

This might make sense, if the power improvements lag as far behind the density improvements as was reported earlier.

I'm curious as to what's going on at the IHVs.
Did AMD sort of eat into its own future by the more significant redesign of RV770 compared to how much GT200 hewed to G92, or is it that Nvidia is further away from the desired design target for late 2009/early 2010?
 
Duh!
And they said "no more big chips"? :D
/any time-frame about 450mm wafers being scheduled for mass production?/

Anyway, I don't think a 40nm design would be a pure shrink, as 260 sq.mm mark is quite comfortable, even now. Maybe beefing TMUs (I want my single-cycle FP16 back), enlarge some shared-mem buffers (16K>32/64K) and most importantly - yet another clock rate bump at 850~900MHz mark for a reference.
Probably, the economy of scale will be again the main agenda, here.

1.6 bilion transistors on the tmsc 45nm process would be about the same size as the RV770. I don't think they can make a 2000SPs and 100 TMUs chip with that "few" transistors. My bet would be in the range of 1280-1600SPs and 64-80TMUs, still with 16ROPs and 4z per clock and a clock speed at about 900MHz.
 
At 40/32nm nodes, if ATi decides to recycle the RV770 design from a "compact" PoV, is it viable to consider adding an eDRAM array to the core--and simplifying the local memory buffer req's? I mean... with all the GDDR5 expected performance bumps and (don't bash me, here) the possibility of adopting XDR2+ most probably they should sound the "go-ahead" horn, for the gazillions of SPs! :LOL:
 
1.6 bilion transistors on the tmsc 45nm process would be about the same size as the RV770. I don't think they can make a 2000SPs and 100 TMUs chip with that "few" transistors. My bet would be in the range of 1280-1600SPs and 64-80TMUs, still with 16ROPs and 4z per clock and a clock speed at about 900MHz.
Looking at the RV770 die shot, the physical I/O stuff along the edges of the die amounts to 24% of the entire die. Presumably this stuff would all end up the same size at 40nm.

So 76% of RV770 is graphics logic. 40% of RV770 is taken by the clusters (ALUs+TUs). So, 36% of RV770 is non-cluster logic. At 40nm that logic could be unchanged in capability (i.e. 16 RBEs, 4x MCs, 1 hub), but presumably would scale.

Scaling from 55nm to 40nm is supposed to be unusual in some respect - I can't remember if the scaling is considerably better or considerably worse than a simple areal evaluation would imply.

Jawed
 
So, about 16% smaller going from 55nm to 40nm?

So a "<=20% bigger" refresh of RV770 seems pretty likely then.

If just RV770's clusters are shrunk to 84%, then naively there's room for 19% more of them :p 960 ALUs :D

I'm doubtful a refresh would make any real changes to the MCs, RBEs, L2s, so they would shrink too. In that case that leaves room for the clusters to grow by 36% while retaining a die size of 256mm2.

If, historically, a 256-bit GPU could be as small as ~190mm2, it seems that at about 256mm2 ATI is paying quite a high price for the combination of GDDR5 and CrossFireX Sideport.

Is this the approximate minimum size for all RVx70 GPUs for a few years to come? If so, isn't this GPU going to get progressively more and more expensive with each new node (presuming that each new node has worse yields per mm2)?

Jawed
 
My thoughts ...

SP & TMUs:
RV670 -> 4 SIMD cores, each with 16 SPUs (4D+1 = 80 ALUs) and 4 TMUs -> 320 ALUs + 16 TMUs
RV770 -> 10 SIMD cores, each with 16 SPUs (4D+1 = 80 ALUs) and 4 TMUs -> 800 ALUs + 40 TMUs (+6 SIMD cores vs RV670)
RV870 -> 16 SIMD cores, each with 16 SPUs (4D+1 = 80 ALUs) and 4 TMUs -> 1280 ALUs + 64 TMUs (+6 SIMD cores vs RV770)

ROPs, MC & BUS:
RV670 -> 4 RBEs, each with 4 ROPs x 2z/clk (16 ROPs), 4 MC -> 256-bit bus (72GB/s, 512MB/1GB GDDR4 @ 4.0GHz)
RV770 -> 4 RBEs, each with 4 ROPs x 4z/clk (16 ROPs), 4 MC -> 256-bit bus (115GB/s, 512MB/1GB/2GB GDDR5 @ 3.6GHz)
RV870 -> 4 RBEs, each with 4 ROPs x 4z/clk (16 ROPs), 4 MC -> 256-bit bus (128GB/s, 512MB/1GB/2GB GDDR5 @ 4.0GHz)
RV870 -> 6 RBEs, each with 4 ROPs x 4z/clk (24 ROPs), 6 MC -> 384-bit bus (192GB/s, 768MB/1.5GB GDDR5 @ 4.0GHz)

SIZE:
RV670 -> 192mm²
RV770 -> 260mm²
RV870 -> ~260mm² (~280mm²)
 
So, about 16% smaller going from 55nm to 40nm?

So a "<=20% bigger" refresh of RV770 seems pretty likely then.

If just RV770's clusters are shrunk to 84%, then naively there's room for 19% more of them :p 960 ALUs :D

I'm doubtful a refresh would make any real changes to the MCs, RBEs, L2s, so they would shrink too. In that case that leaves room for the clusters to grow by 36% while retaining a die size of 256mm2.

If, historically, a 256-bit GPU could be as small as ~190mm2, it seems that at about 256mm2 ATI is paying quite a high price for the combination of GDDR5 and CrossFireX Sideport.

Is this the approximate minimum size for all RVx70 GPUs for a few years to come? If so, isn't this GPU going to get progressively more and more expensive with each new node (presuming that each new node has worse yields per mm2)?

Jawed

Couldn't they combine faster GDDR 5 with a 128 bit bus? Isn't GDDR 5 expected to scale far beyond the speed the 4870 current comes with?

Regards,
SB
 
GDDR5 is expected to scale to 7GHz. So, when it does a 128-bit bus will give almost the same bandwidth as HD4870 currently has.

Of course it's still arguable how much bandwidth HD4870 requires - since it seems it's difficult to find games where it's significantly bandwidth limited (or, if you prefer, to find review sites testing at bandwidth-busting settings). But it appears there may be some complex latency-related factors at play and that GDDR5 clocks for HD4870 have been chosen for latency more than bandwidth. Dunno.

Jawed
 
Funny thing is, that was posted at a time when ATI was expected (by me at least) to continue increasing ALU:TEX. RV770 seems to mark the start of a new era, where this ratio holds at 4:1. So it's looking pretty unlikely there'll be 2000 ALU lanes on a single ATI GPU any time soon. Kinda looks like we'll have to wait for 32nm...

Jawed

Hold on now, one generation of products isn't enough to pronounce a trend shift. I don't believe so, anyway. ATi has long held to the notion that compute power should increase with successive generations relative to texture filtering/sampling abilities. Why change now?

I believe R7xx is a "correction" to the mistake that was R6xx and it's horrible lack of texturing/z-fill/and AA sample rates. I'm sure you'd agree with me on this. Now that these mistakes have been corrected, there's no need to do so again. Thus, ATi can return to their preferred design philosophy with the R8xx generation of products if they are in a position to do so (and with the shrink to 40nm I can see no reason why they wouldn't).
 
Back
Top