Were there ever any die photos of an e6760 available? I'm not implying anything - I just remember the discussion at the time and would be interesting to compare the differences between the two chips.
Don't think so, no. The comparison was about TDP & general perf-level more than anything IIRC.
Except his flat out written statement is only partially correct. It is not fabbed at TSMC, It is fabbed at Renesas. He can't tell TSMC vs Renesas from the work they did, but he can tell 40nm vs 55nm.
Here is a source for it being fabbed at Renesas:
http://www.eetasia.com/ART_8800678216_499489_NT_f90242a2.HTM
If you don't have a login you can get to that same article from google without a login via:
http://www.google.com/url?sa=t&rct=...3DfFwTTWt5pAKSA&bvm=bv.48293060,d.cGE&cad=rja
Well, I really wouldn't know, but that's what's in the Durango dev docs: ~60% efficiency for Xenos vs near 100% for GCN.
http://www.neogaf.com/forum/showpost.php?p=48282244&postcount=109
Well, I really wouldn't know, but that's what's in the Durango dev docs: ~60% efficiency for Xenos vs near 100% for GCN.
http://www.neogaf.com/forum/showpost.php?p=48282244&postcount=109
Those 60% Xenos efficiency numbers are about right. AMD reported an average 3.4 slot utilization for VLIW5 as the reason why they dropped an ALU from their SPU and went VLIW4 in Cayman.Anyone knows the efficiency numbers for VLIW5 pipelines architectures?
Those 60% Xenos efficiency numbers are about right. AMD reported an average 3.4 slot utilization for VLIW5 as the reason why they dropped an ALU from their SPU and went VLIW4 in Cayman.
Which does bring up another angle to the 160, 240, 320 SP Latte debate. What if Nintendo took a 320 SP RV730 and made it VLIW4 by cutting every 5th SP resulting in 256 SPs? With better ability to code close to metal on consoles, developers could probably get better slot utilization than the 3.4 on PC, but the 5th SP would still be underutilized. By getting rid of the 5th SP, the Wii U loses a little bit of peak performance, which may contribute to why it may not be performing as well as some people think a 320 SP GPU should, but they save die space and power. It's still VLIW based so it wouldn't require the design effort that other custom work would and some of the changes made in Cayman could be used as a reference.
That would be nice, but the shader blocks only have enough register pools for exactly 160 shaders. Also, I'm quite confident that there are 8 TMUs, which make perfect sense of two SIMD cores. Here's a post I made on GAF explaining how I was able to identify the TMUs/L1 cache: http://www.neogaf.com/forum/showpost.php?p=59514681&postcount=5604
I think you are right IF there is indeed 32MB of eDRAM on the GPU die, I don't remember Nintendo confirming that information. Though as there is nobody to question that amount, rumors or leaks, I think that you are right we might deal with 320SP.Sorry that I haven't been able to come by in a while.
Lets look at the 160 SPU shader hypothesis, and why there are good reasons to question it.
It is based on two underlying assumptions:
1. The SRAM blocks of Latte has to be arranged exactly as on the RV770 (visually as well as logically)
2. The analysis of the constituents of these particular blocks on the die shot is correct.
Which would lead to these conclusions:
3. The total number of SPUs are 160 arranged in 8 groups of 20.
But since the density of an SPU ALU block is equivalent to what AMD provided in 2007/8 on 55nm lithography, it also follows that either:
4a. Latte is actually produced in 55nm lithography.
or
4b. The Latte SPU blocks are roughly half the density of what AMD produced on their 40nm Brazos platform, two years earlier, with DX11 capable ALUs. For whatever reason.
4a is contradicted by both Chipworks, and the fact that the eDRAM density is quite close to what IBM is achieving on their 32nm Power7+! It is actually better than any product I've managed to find on 40nm. (55nm eDRAM cell sizes are a factor of two larger again when compared to 40/45nm cell sizes from TSMC/Renesas/IBM.) To achieve such high eDRAM density on 40nm, we have to assume that process maturity and relatively low clock targets has contributed to the good result. Which is not inconceivable, after all. But to assume that it could produce yet another factor of two in density... Suffice to say that there isn't a single example even in the ballpark of such density on 55nm anywhere.
4b is simply bizarre. Latte is introduced two years after AMDs Ontario, has less demanding clock targets, and is assumed by some to be less complex. Assuming that Lattes SPU blocks under these circumstances would be roughly half the density of Ontarios is very, very strange. How could that be?
So there it is - if you assume that point 1 and 2 is true, then you paint yourself into a very difficult corner where you have to justify how either point 4a or 4b could be correct.
Personally I prefer to question point 1. AMD has been modifying their VLIW GPU architecture for the better part of a decade by now, of course they can change (the appearance of) the SRAM blocks!
In which case 320 SPUs is a good match for all data we have.
Sorry that I haven't been able to come by in a while.
Lets look at the 160 SPU shader hypothesis, and why there are good reasons to question it.
It is based on two underlying assumptions:
1. The SRAM blocks of Latte has to be arranged exactly as on the RV770 (visually as well as logically)
2. The analysis of the constituents of these particular blocks on the die shot is correct.
Which would lead to these conclusions:
3. The total number of SPUs are 160 arranged in 8 groups of 20.
But since the density of an SPU ALU block is equivalent to what AMD provided in 2007/8 on 55nm lithography, it also follows that either:
4a. Latte is actually produced in 55nm lithography.
or
4b. The Latte SPU blocks are roughly half the density of what AMD produced on their 40nm Brazos platform, two years earlier, with DX11 capable ALUs. For whatever reason.
4a is contradicted by both Chipworks, and the fact that the eDRAM density is quite close to what IBM is achieving on their 32nm Power7+! It is actually better than any product I've managed to find on 40nm. (55nm eDRAM cell sizes are a factor of two larger again when compared to 40/45nm cell sizes from TSMC/Renesas/IBM.) To achieve such high eDRAM density on 40nm, we have to assume that process maturity and relatively low clock targets has contributed to the good result. Which is not inconceivable, after all. But to assume that it could produce yet another factor of two in density... Suffice to say that there isn't a single example even in the ballpark of such density on 55nm anywhere.
4b is simply bizarre. Latte is introduced two years after AMDs Ontario, has less demanding clock targets, and is assumed by some to be less complex. Assuming that Lattes SPU blocks under these circumstances would be roughly half the density of Ontarios is very, very strange. How could that be?
So there it is - if you assume that point 1 and 2 is true, then you paint yourself into a very difficult corner where you have to justify how either point 4a or 4b could be correct.
Personally I prefer to question point 1. AMD has been modifying their VLIW GPU architecture for the better part of a decade by now, of course they can change (the appearance of) the SRAM blocks!
In which case 320 SPUs is a good match for all data we have.
... (Renesas vs TSMC). I have attempted to clarify this in the past (although it seems not to have taken hold, unfortunately) that the 40nm TSMC was Jim's guess or a guess from one of his colleagues after taking an initial glance at the die. I followed up with him shortly after on the subject and he said that they had not performed any precise gate measuring, and that 40nm and 55nm were actually pretty hard to tell apart without getting those figures.
It can't be 16MB (half the RAM to fit 55nm rather than 40), because alledgedly, wuu uses eDRAM to emulate wii 1T SRAM main memory, and wii/GC has 24MB main memory 1T SRAM. Also, it probably won't be 24MB eDRAM on the die if the number of banks visible don't seem to match; IE, even power of 2.I think you are right IF there is indeed 32MB of eDRAM on the GPU die
Those 60% Xenos efficiency numbers are about right. AMD reported an average 3.4 slot utilization for VLIW5 as the reason why they dropped an ALU from their SPU and went VLIW4 in Cayman.
Which does bring up another angle to the 160, 240, 320 SP Latte debate. What if Nintendo took a 320 SP RV730 and made it VLIW4 by cutting every 5th SP resulting in 256 SPs? With better ability to code close to metal on consoles, developers could probably get better slot utilization than the 3.4 on PC, but the 5th SP would still be underutilized. By getting rid of the 5th SP, the Wii U loses a little bit of peak performance, which may contribute to why it may not be performing as well as some people think a 320 SP GPU should, but they save die space and power. It's still VLIW based so it wouldn't require the design effort that other custom work would and some of the changes made in Cayman could be used as a reference.
Grall said:It can't be 16MB (half the RAM to fit 55nm rather than 40), because alledgedly, wuu uses eDRAM to emulate wii 1T SRAM main memory, and wii/GC has 24MB main memory 1T SRAM. Also, it probably won't be 24MB eDRAM on the die if the number of banks visible don't seem to match; IE, even power of 2.
So it's probably 32MB after all. :smile:
Yeah, I meant drop one out of every five ALUs, specifically one of the simple ALUs, rather than the actual 5th complex ALU. Keeping the dedicated t-unit should involve less hardware and driver changes than implementing 4 beefed up ALUs.The problem is that 5th execution unit in AMD's VLIW5 actually does unique operations so you can't just drop it. Cayman redstributed the operations to the other four units.