Trinity vs Ivy Bridge

Does anybody see more than 6 SIMD engines (384 SPs) in that die shot of Trinity?

Trinity_Die_Low_wm.png
Is Trinity supposed to have a new revision of the BD architecture? I think there's an additional SRAM bank to the instruction pre-decode array in the front-end, compared to the current revision of BD. :???:
 
Is Trinity supposed to have a new revision of the BD architecture? I think there's an additional SRAM bank to the instruction pre-decode array in the front-end, compared to the desktop revision of BD. :???:
Trinity is Piledriver-based.
 
You're right, Llano indicates ~600 GFLOPS, Trinity indicates ~1100 GFLOPS, making the difference ~78% according to that chat. Obviously there's room for interpretation based on how thick the line is and where the labels are placed, but that's how I see it anyway.
No, I think 1100 GFLOPs is for that ominous "2013 platform" (Trinity successor). If you look very closely, you see a slight bend in the line just above the 800 mark. So Trinity seems to be in the 800-850 range of that chart.
 
No, I think 1100 GFLOPs is for that ominous "2013 platform" (Trinity successor). If you look very closely, you see a slight bend in the line just above the 800 mark. So Trinity seems to be in the 800-850 range of that chart.

Ahhh, yes I do see that. Ok, so Llano is ~600, Trinity is somewhere around the 850 mark. That's not too far off from 50% depending on the rounding error on Llano. I mean, if we look at the backside-kink of that line, Llano might be ~550 ;)

The real deal is that's just a really terrible graph, and given the multiple sources posted above, is obviously wrong and should be dismissed.
 
Scaled comparison of Llano and Trinity, using the I/O pads on the left side for reference:

58228576.png


Some observations on the layout of the SIMD multi-processors -- the placement of the register file banks in the ALU array is different in Trinity, as well as the whole layout of the texture unit.

Here are the differences (so far) on the CPU side -- BD vs. Piledriver cores:

16108893.jpg


Those banks are most probably the pre-decode bits (used for the BTB, branch selector, end bits & etc.), that AMD has been using ever since the first K7 architecture to aid the instruction decode flow. And since these are located in the branch prediction area of the front-end block, I guess AMD is aiming at improving namely this aspect of the architecture.
 
Last edited by a moderator:
They "promised" 30 % higher GPU performance and 715 GFLOPs (but those Llano numbers are wrong)
So it's 20% more CPU perf + 30% more GPU perf = 50% more perf! :LOL:

I am personaly specualting there will be 2 x 256 bit FMAC in each module, so that would be doubling peak Flops and then a clock boost on top. So over 200GFlops from the CPU alone, so the GPU won;t have to clocked that high to reach the projected total GFLOP values. But since I might be the only one thinking that, I could be very wrong. :D
 
So it's 20% more CPU perf + 30% more GPU perf = 50% more perf! :LOL:

I am personaly specualting there will be 2 x 256 bit FMAC in each module, so that would be doubling peak Flops and then a clock boost on top. So over 200GFlops from the CPU alone, so the GPU won;t have to clocked that high to reach the projected total GFLOP values. But since I might be the only one thinking that, I could be very wrong. :D

It is (or at least it was supposed to be) 50% more FLOPS on the GPU for 30% more performance in actual games, and up to 20% more performance on the CPU side for common applications.

The FPU appears to be largely unchanged, so no 256-bit FMACs.
 
What strikes me as odd is the GPU in Trinity. The 6 VLIW4(?)-SIMDs only take up ~as much space as the 5 SIMDs in Llano, yet the "uncore" of the Trinity GPU is MUCH larger and appears to be the only reason why Trinity is larger than Llano. Any idea what all that space is used for? Larger cache(s) to reduce memory bandwidth bottlenecks?
 
Compared to the ALU blocks, the rest of Llano's GPU is 3,78-times bigger. But 4,75-times bigger for Trinity (rough numbers). I would expect exactly opposite numbers... :???:
 
Compared to the ALU blocks, the rest of Llano's GPU is 3,78-times bigger. But 4,75-times bigger for Trinity (rough numbers). I would expect exactly opposite numbers... :???:

Would you? Cayman had fewer shaders than Cypress, but was significantly bigger. And (presumably) it didn't have has much redundancy for vias and stuff.
 
Would you? Cayman had fewer shaders than Cypress, but was significantly bigger. And (presumably) it didn't have has much redundancy for vias and stuff.
Cayman had 24 SIMDs vs. Cypress' 20. So even though that's a few less ALUs, that's 20% more texture units, L1 cache, LDS memory, etc.
 
Cayman had 24 SIMDs vs. Cypress' 20. So even though that's a few less ALUs, that's 20% more texture units, L1 cache, LDS memory, etc.
The GPUs of Trinity vs. Llano exhibit the exact same ratio as Cayman vs. Cypress (trading five VLIW5 vs. six VLIW4 SIMD engines). ;)
 
I am personaly specualting there will be 2 x 256 bit FMAC in each module, so that would be doubling peak Flops and then a clock boost on top. So over 200GFlops from the CPU alone, so the GPU won;t have to clocked that high to reach the projected total GFLOP values. But since I might be the only one thinking that, I could be very wrong. :D

The present desktop BD's cannot keep the FPU fed with data. What exactly would be the point of doubling the peak flops when you are so bandwidth-starved that it would never increase real-world performance?
 
This is not a BD, this is Trinity. It has a different mem controller and different goals, maintaining maximum throughput being not one of them. Going to 256bit will happen someday anyways, the sooner the better.

PS. Source of your claim?
 
This is comparison of the IGP "uncore" sections of Llano and Trinity -- SIMDs are cut out too. Trinity's section takes 40% more area, compared to Llano's.

74272699.png
 
Any explanation? Cayman has bigger ROPs (EQAA, faster ops with single/dual-channel FP32, faster Int16, coalesced writes). Maybe VCE and 3rd display pipeline for Eyefinity have some impact, too. But I can't believe that improved ROPs, VCE processor and 3rd display output can be responsible for such a massive difference.
 
The pictures do not look like they have an equivalent level of detail, but if the Trinity shot is accurate and no obfuscated, the GPU section looks like it has fewer ordered grids that correspond to SRAM or customized logic than Llano. The kind of featureless pudding inbetween all the storage is visually similar to the RV770 die shot.

Perhaps AMD has allowed the logic on the periphery to bloat, due to more standard cells and automated layout. The logic sections may be physically bigger out of proportion of any transistor count increase, possibly to reduce leakage or variation.
 
Back
Top