Trinity vs Ivy Bridge

Are you saying two Piledriver modules @ 3.8->4.2GHz aren't enough to beat a dual-core Sandybridge i3 @ 3.1GHz?!
I find that way too pessimistic.
I think that's rather realistic rather than pessimistic :(.
I suspect clock for clock Trinity isn't really much if anything faster than Zambezi - it will have slightly higher IPC but the loss of L3 probably about evens this out.
And just look against these results of a FX4100 (3.7Ghz/3.8Ghz Turbo):
http://ht4u.net/reviews/2011/amd_fx_6100_4100_review/index31.php
There's a i3 2120 in there (3.3Ghz, no Turbo).
There's just no contest in single-thread performance, and even multithreaded the FX4100 just draws even.
Hence the Pentium (with no HT and slightly slower clock) would lose in multithreaded performance but still demolish the FX4100 in singlethreaded performance.
Granted the fastest A10 has slightly higher clock (10% with Turbo, otherwise it's insignificant so you better hope it can make good use of Turbo), that will be nowhere enough to catch a Pentium in single-threaded performance, but should extend the lead in multithreaded apps, and be enough to beat the i3 in that area - but given the huge difference in single-thread performance I still wouldn't really say overall it's faster, that will entirely depend on the workload.
You were using a slightly slower i3 but that doesn't change the picture much (as it's just a 6% difference), though yes if your focus is on multithreaded performance the A10 should win against that, and the single-thread difference shrink.
Of course that's ignoring any possible Ivy Bridge chips at that time...
 
Yeah I wouldn't expect the loss of the L3 to have much of a negative impact in "consumer" workloads. If taking out the L3 reduces memory latency, it might even help in some cases.
 
I wouldn't count Bulldozer's L2 and L3 caches as very important factors for their gaming performance. If all those technical reports about Bulldozer's caches being terribly slow are true, it could be that the Piledriver's 4MB of L2 cache becomes a lot more efficient for games than Bulldozer's 8MB L2 + 8MB L3.

For example, Llano actually has about the same performance per clock as Phenom II X4.
Yes, but I'm pretty sure it's mostly because the per-core L2 was doubled. Without that you'd have some performance loss (still not that much).

Yeah I wouldn't expect the loss of the L3 to have much of a negative impact in "consumer" workloads. If taking out the L3 reduces memory latency, it might even help in some cases.
If that actually reduces memory latency, we'll see (though you're right it could be enough to make up for any loss of l3 cache, though I'm pretty sure it won't be enough for it to be faster, at least not on average). Anyway, even if you think taking out L3 means absolutely squat for performance, you're still talking about pretty minimal IPC gains, just shift the comparison up by two speed grades for the i3 (they go up to 3.4Ghz nowadays anyway, the 3.1Ghz ones might not be popular anymore when Trinity launches, not to mention IVB).
 
Yes, but I'm pretty sure it's mostly because the per-core L2 was doubled. Without that you'd have some performance loss (still not that much).


If that actually reduces memory latency, we'll see (though you're right it could be enough to make up for any loss of l3 cache, though I'm pretty sure it won't be enough for it to be faster, at least not on average). Anyway, even if you think taking out L3 means absolutely squat for performance, you're still talking about pretty minimal IPC gains, just shift the comparison up by two speed grades for the i3 (they go up to 3.4Ghz nowadays anyway, the 3.1Ghz ones might not be popular anymore when Trinity launches, not to mention IVB).
At least it will get a higher performance pre die size(not taking the GPU part as a part of the die size)
 
The big deal about the cache isn't really the L3, it's how horribly slow the L2 is. If they have managed to speed up the *L2* access because of the lack of L3, or even just because they will have had more time to work on it, it would show up pretty dramatically.

My totally unscientific 2c are that if the L2 latency goes down by just 3 cycles, that will count for more than the loss of L3.
 
There's a picture somewhere showing how the clock mesh inductors are placed along those dividing lines in the Trinity module.
This does explain the thicker lines seen in the die shot earlier in this thread.


By the way, presentations from Hot Chips 23 are now public.
http://www.hotchips.org/conference-archives/hot-chips-23

There is a presentation from AMD concerning dynamic power management.
In it, there is discussion on how Llano can drive over TDP, taking advantage of the capacity of the package and thermal solution to absorb the spike.
There are also slides concerning electrical concerns around clock and power gating, as well as substrate and integrated management of current and voltage variations due to load and conditions.
The actual physical realities of how voltage and current vary show how complex gating and varying clock can be.
Those data sheet constants are not so constant in real life, which looks to be one contributing reason as to why TDP is averaged over a longer period of time.
 
I eyeballed 2.5 cores.

So when does the GPU overtake the CPU in terms of area? Broadwell?

Did you count the L3? It's shared with the GPU, so I'm not sure how it should be counted.

It could be sooner than Broadwell, i.e. Haswell. After all, Trinity's already there.
 
I eyeballed 2.5 cores.
The boxes are actually drawn quite inexact, I get 212x255 pixels for the gpu (including gpu i/f but not display controller), and 165x357 for the cores (just cores without L3 or anything). In other words, the gpu is 90% of the size of 4 cores.
So when does the GPU overtake the CPU in terms of area? Broadwell?
Well it seems to be almost there already (certainly amd is leading in that metric :)). If rumors of gt1/gt2/gt3 variants are true for Haswell, it seems a very safe bet that the GT3 version will be larger than 4 Haswell cores.

edit: actually intel is already there too with some cpus. Sandy Bridge 2 core / GT2 has more die area for gpu than cores if you don't count l3. IVB 2 core / GT2 has more die area for gpu than cores even if you count l3 (though I really don't want to count it).
 
Last edited by a moderator:
Did you count the L3? It's shared with the GPU, so I'm not sure how it should be counted.
Yeah. It's not coherent with the GPU, is sized according to the CPU and so on. Which is why I consider it part of the CPU cluster.
It could be sooner than Broadwell, i.e. Haswell.
Possibly, yeah. The GT3 variant should definitely have more area devoted to GPU.

After all, Trinity's already there.
The awfulness of their CPU's kinda rules them out. :cry:
 
The awfulness of their CPU's kinda rules them out. :cry:
Yes, just like Phenom II was pure crap because Phenom "1" was? ;)
And in all seriousness, even though AMDs CPU-cores are weaker than Intels, it doesn't mean they're actually too bad to use
 
So the HD4000's graphics performance is ~50% above HD3000.

I'd say the GPU performance margin between Sandybridge and Llano won't change much for Ivybridge and Trinity.

I haven't actually computed the average, but +50% seems to be the upper bound, so it's probably around 40%. And Llano is still ahead in all benchmarks, often by a wide margin.

This is more reasonable than the >2× claims we were hearing, although I suppose the picture might slightly improve with better drivers.

HD 4000 does very well in the (single) Compute benchmark, though.
 
Back
Top