I concur on the ROPs, but i fail to grasp, how on earth AMD would fit +50% Shader-ALUs and double the number of TMUs in just ~33% more die space. TMUs supposedly are pretty large in terms of die space and transistor count - especially those doing FP64 single-cycle…Let's remember the supposed doubling of TMUs, the 50% increase in ALUs and the hopefully tweaked ROPs. These improvements should fill up some of that extra bandwidth.
I concur on the ROPs, but i fail to grasp, how on earth AMD would fit +50% Shader-ALUs and double the number of TMUs in just ~33% more die space. TMUs supposedly are pretty large in terms of die space and transistor count - especially those doing FP64 single-cycle…
G92 in comparision to G94 doubles both TMU and ALU count and die-space difference is still lower than ~30%. Does nVidia employs magicians? :smile:I concur on the ROPs, but i fail to grasp, how on earth AMD would fit +50% Shader-ALUs and double the number of TMUs in just ~33% more die space. TMUs supposedly are pretty large in terms of die space and transistor count - especially those doing FP64 single-cycle…
Thats because die space is taken up by more than just shader-alu's and TMU's....
After all, RV670 removed the 512-bit bus and included UVD but still dropped in transistor count by 34+ million transistors from the R600...
G92 in comparision to G94 doubles both TMU and ALU count and die-space difference is still lower than ~30%. Does nVidia employs magicians? :smile:
Transistor count went up by ~49% - and different kinds of logic on the GPU take up differing amounts of space, e.g. cache and register file sections can be considerably more dense than the rest.G92 in comparision to G94 doubles both TMU and ALU count and die-space difference is still lower than ~30%. Does nVidia employs magicians? :smile:
Doubling unit count on 65nm requires less than 30% die-space, while doubling unit count on 55nm not? Less? Or more? I can't get this logic... Manufacturing node is irrelevant, if you count relatively.Thats on 65nm too. You would think on 55nm, the increase in die area would be even less.
Doubling unit count on 65nm requires less than 30% die-space, while doubling unit count on 55nm not? Less? Or more? I can't get this logic... Manufacturing node is irrelevant, if you count relatively.
There are parts of the chip that have *fixed size* and don't scale down with the manufacturing process? I've never heard that before.
I was talking about G94->G94 scaling vs. theoretical RV670->RV770 (rumoured 480 ALUs / 32 TMUs) scaling.You're confusing die size reductions based on manufacturing process with the relative increase in die size for adding units. We are discussing the latter. My use of "scale" isnt related to 55nm vs 65nm. It's in context of RV635 vs RV670 for example.
Edit: Actually I think I missed no-X's question completely. He was asking about G94->G92 scaling vs RV635->RV670 scaling. I dont think you can draw any conclusions about one by looking at the other.
G92 in comparision to G94 doubles both TMU and ALU count and die-space difference is still lower than ~30%. Does nVidia employs magicians? :smile:
I was talking about G94->G94 scaling vs. theoretical RV670->RV770 (rumoured 480 ALUs / 32 TMUs) scaling.
G94->G92 = ALUs +100%, TFs +100%, TAs +100%, ROPs and MC +0% ~ die space +28% / transistors +49%
RV670->RV770 = ALUs +50%, TFs + 100%, TA + 0%(?), ROPs and MC +0% ~ die space + 32%.
As we expect 7xx MHz clock-speed, I'd say, that RV770's design was targeted for higher transistor density rather than higher clock-speed (other phrenetic theory coud be 1,5x clocked shader domain, which would tend to nice 4,5:1 ALU:TEX when using 480 ALUs and 32 texture units, it would explained ~1GHz rumours, but I doubt...)
Increasing TF without increasing TA won't improve performance. You can't use texels in filtering until you've worked out which texels you need to fetch. EDIT: should qualify that's for bilinear. I suppose TA:TF = 1:2 could have a similar effect as in G80. Erm...RV670->RV770 = ALUs +50%, TFs + 100%, TA + 0%(?), ROPs and MC +0% ~ die space + 32%.
Because both ALUs and TUs access the register file directly (as far as I can tell) I think they have to be clocked "synchronously".As we expect 7xx MHz clock-speed, I'd say, that RV770's design was targeted for higher transistor density rather than higher clock-speed (other phrenetic theory coud be 1,5x clocked shader domain, which would tend to nice 4,5:1 ALU:TEX when using 480 ALUs and 32 texture units, it would explained ~1GHz rumours, but I doubt...)