Trinity vs Ivy Bridge

Masking the other core off showed performance improvement, although it was pretty modest.
because you gained decoding bandwidth on a single core.

There are issue restrictions for which EXE pipeline can do what, such as MUL and DIV, and branches can only use one pipeline.
same as intel's CPU.

edit: Also a lack of move elimination, which is more noticeable with the claustrophobic 2 issue slots. Later iterations of the architecture will give the AGU ports the ability to handle moves, though. Intel's design does better.
AGLU handles only MOV R,M; and MOV R,R in PD. Yet, they are 'extra alu' when they work this way, as they require an additional MOP from the front end. And if they starve trying to parse 2 instr/cycle, I dont see how they could parse 3 or 4 (or 5 with a fused jump...). Consider AMD MOP are 'fatter' than intel ones, and many instructions decode into a single MOP for an higher bandwidth.

The bigger problem is that general-purpose processors have trended towards being resilient enough to not require so much handholding.
manual asm where you count the issue port, the memory stalls and decoder fetching, code aligning and memory prefetching can easily give you a +100/200% boost in critical code.

@sebbbi: yeah, you are right. I was meaning exactly that... getting from 1.0 to 1.5 would give a big boost, since it's for 2 core. Once all moves can be executed from AGLU will be slightly better, since MOVs are pretty ubiquitous in code and you'd have to gain even more. So, a better front-end can surely improve quite a bit the AMD performance. Sure, it wont even catch-up Intel (and since I need single-thread performance atm, you can guess which processor I use, btw I hope they'll use fluxless soldering later for Ivy -.- ).
 
Last edited by a moderator:
because you gained decoding bandwidth on a single core.
It also gained use of the full L2, branch predictor, and FPU, and halved contention for the writeback path.
The decoder is one of many things that went into that gain.

same as intel's CPU.
It's distributed amongst 3 ports, not two. Having a third port to handle a branch allows for things like handling a branch and a MUL at the same time. This is on top of a 50% advantage in most other integer ops.

AGLU handles only MOV R,M; and MOV R,R in PD. Yet, they are 'extra alu' when they work this way, as they require an additional MOP from the front end. And if they starve trying to parse 2 instr/cycle, I dont see how they could parse 3 or 4 (or 5 with a fused jump...).
Without optimizations or tricks such as eliminating moves or allowing them to run on the AGU ports, there is no starvation because the back end is effectively 2-wide 90% of the time and the actual ALU ports get stuck with a MOV when they could be doing something else.
The primary downside to running MOV ops through the AGU port is that it could add latency for reg/mem ops, although it's still a win over BD because it would still delay them for every op.

manual asm where you count the issue port, the memory stalls and decoder fetching, code aligning and memory prefetching can easily give you a +100/200% boost in critical code.
The vast majority of code they face isn't manually optimized, and Intel beats AMD in a vast swath of that as well.
It's also not worthwhile outside of a few niches, since optimizing that heavily for BD does not give commensurate benefits over just going with the competing design.
 
Quick silly question: for the Radeon Memory Bus, it's 256-bit per memory channel... isn't that overkill for each of the 64-bit DDR3 or is that for compensating the difference in clocks (transfer rates) :?:
 
I wish these reviews would throw in a desktop setup that offers similar performance to the notebook lineup they are reviewing. The notebook and desktop hardware having similar naming but very different clocks makes it hard to get an idea of where everything sits in the grand scheme.
 
Quick silly question: for the Radeon Memory Bus, it's 256-bit per memory channel... isn't that overkill for each of the 64-bit DDR3 or is that for compensating the difference in clocks (transfer rates) :?:

There was some speculation that it had to do with the bus also being used for communication between the CPUs and GPU.
Another possibility is that this is a wide bus running with a divider to the NB/controller clock, so multiple transfers will build up in a buffer before making it over.

And maybe room to grow...?
 
Quick silly question: for the Radeon Memory Bus, it's 256-bit per memory channel... isn't that overkill for each of the 64-bit DDR3 or is that for compensating the difference in clocks (transfer rates) :?:

I think it's used for internal communication too, i.e. CPU<->GPU.
 
As expected Trinity dGPU starts to shine when detail levels are cranked up.

http://hothardware.com/Reviews/AMD-Trinity-A104600M-Processor-Review/?page=9

JustCause2.png



compared to this (from http://www.pcper.com/reviews/Mobile...vy/Performance-Synthetic-3D-Real-World-Gaming )

justcause2.png


Also this is a nice review with some interesting pictures like how resonate clock grid looks, but it's in Polish:

http://pclab.pl/art49830.html (Google translate: http://translate.googleusercontent....0.html&usg=ALkJrhioi8gOlFNaHzq-bLwyPXKE6PVj2w )


cewka_w_krzemie.png


Overlay I'm quite impressed by CPU improvement compared to A8-3500M! In places it's not far off SandyB based products but still some areas are severely lacking compared to Intel.
Overlay good job from AMD but they need to do two more steps like that in CPU department with Steamroller to be more or less where Intel is at the moment.
 
Last edited by a moderator:
Another point to consider with power steering across the CPU and GPU.

Good point!
This brings questions to my mind you might not be in position to answer yet:
- will desktop version feature dGPU boost or is 100W TDP enough to keep it at maximum (800MHz?) clock all the time
- is dGPU scaling by freq. only or both freq. and voltage?

BTW really nice power saving features. I like that you scale virtually everything to save power, even memory clock. Only bit I'm not very impressed about is VCE. Why is it not as fast as Intel or nVidia equivalents?
 
Do they have a choice? I doubt there's a huge market out there willing to pay for an expensive chassis without the best internals. Anand mentions a probable $600 ceiling for Trinity based laptops.

Don't they? Wasn't Intel stopping OEMs from putting cutting-edge ULV CPUs into sub-$1000 laptops until recently?
Of course I'm not saying AMD should put their stakes that high (it'd be suicide), but they should cross a line somewhere..
They want to build a crappy machine where they save on everything? Give them Atoms and Brazos.. Trinity is supposedly too valuable for that, and so is its branding potential.
 
I find it a bit strange that the same games it loses to Ivy are the same games that AMD loses heavily to Nvidia. Batman, Dirt 3, Skyrim etc, that seems a bit too much to be a coincidence to me.
So? Why would that make you conclude that it was a driver issue? You do know that different games stress different parts of the GPU (and CPU) right?

JC2 result is nice, but you sorta cherry-picked that one. Even in the same review a lot of the other "high quality" results show more similar performance between Ivy and Trinity. I doubt there's a general trend here related to "quality" levels. I imagine it's more like that there are just strengths and weaknesses of each architecture. The game/engine appears to be the primary factor that determines the relative performance.

It's also pretty hard to draw any conclusions based on stuff like "high vs low" quality, since these things are not created equal across games :) BF3 low quality looks better than most games high quality ;)
 
Last edited by a moderator:
So? Why would that make you conclude that it was a driver issue? You do know that different games stress different parts of the GPU (and CPU) right?

I would conclude it as being a driver issue when there are other factors pointing to that. For example...

http://www.hardwareheaven.com/revie...7970-crossfire-performance-review-skyrim.html

Crossfire 7970's getting blown away by sli 580's in Skyrim. Just incase you missed it and thought it was unfavourable settings, it's at 5760x1080 where the 580 should be capitulating instead of winning.

As for Batman? http://www.legitreviews.com/article/1834/4/

So AMD performs poorly in these two games especially...and also has broken crossfire drivers? And it just so happens both of these games are made by "nvidia friendly" devs. And they are 2 losing games out of 15 and probably 50 games where intel is faster than Trinity? The cpu argument simply doesn't wash here because AMD loses heavily to Nvidia in the same games, with the same intel cpu. How many coincidences does it take before you start to question what is really happening?

By all means lets us see the compelling case you make for your reason why, I'm all ears. I'm quite content to believe I'm wrong but it sure looks like drivers to me and I don't think anyone can hold that against me for believing so considering the weight of evidence.

Edit - I didn't mention JC2, I mentioned Civ 5.
 
Negative Scaling on this HH test ... 46-47 and max 100 ( who was surely a peak somewhere ) ... even the min is divided by 2 between CFX and single 7970 .. ( CFX17fps and single 34fps ) ..

Same on Arkam city for CFX it seems in the legit review.

Note if you look HH 680SLi test vs CFX, there's the same problem for Nvidia in Skyrim at 1920x1080 .. for both it seems there's some problem with Skyrim. ( as said in HH article, multi gpu is really temperamental )
( not all have thoses result too )

http://www.hardwareheaven.com/revie...li-performance-review-batman-arkham-city.html
 
Last edited by a moderator:
Skyrim is frankly all over the the place and different sites will show different things as there as they all have different test settings and its a fraps based walkthrough, for instance TPU shows a very different case. With regards to Batman your link shows an 8x MSAA test, which is hardly what was run on Trinity in Anand's case - in this instance there was a specific case that affected AA performance only, but that but that has been addressed already.
 
After looking at the review it appears to me that there is a massive improvement of the HD4000 over the HD3000 more so then the AMD A8 vs the AMD A10.
 
Skyrim is frnakly all over the the place and different sites will show different things as there as they all have different test settings and its a fraps based walkthrough, for instance TPU shows a very different case. With regards to Batman your link shows an 8x MSAA test, which is hardly what was run on Trinity in Anand's case - in this instance there was a specific case that affected AA performance only, but that but that has been addressed already.

I'm aware that Skyrim is all over the place but it almost never looks good for AMD. TPU is the exception that proves the rule :p

I have also seen pretty variable results for Batman as well, but that just makes me think that you (AMD) haven't optimised all of the game levels whereas Nvidia has. Either that or Nvidia is pushing their favoured levels harder in terms of the reviewer guide...

It's not just these games by far. When I look at the BF3 benchmarks and see ~30% in favour of the 680 (vs 7970) on Anandtech, and only 5% in favour on Toms...well it makes me wonder what the real reasons are behind it.
 
I dont speak for Trinity, but change the AA method to FXAA only and it seems the result are quite different in BF3, but in reality i see result all over the place for BF3 too lol .

If you want the strangest result ( but they have explain why ) http://lab501.ro/placi-video/nvidia-geforce-gtx-670-review/19
Different place tested.

Anyway im sure there's some little thing to fix in BF3 and Skyrim ... something dont really match there.. ( specially with this FXAA vs 4XAA difference in BF3 on 7970's, single vs multiplayer difference etc )
 
Last edited by a moderator:
But wouldn't drivers increase performance over time? Is that really a bad thing to look for to better drivers in the future?
 
Don't they? Wasn't Intel stopping OEMs from putting cutting-edge ULV CPUs into sub-$1000 laptops until recently?
Of course I'm not saying AMD should put their stakes that high (it'd be suicide), but they should cross a line somewhere..
They want to build a crappy machine where they save on everything? Give them Atoms and Brazos.. Trinity is supposedly too valuable for that, and so is its branding potential.

why not?
if you're going to use a 17W APU and a cheap SSD, then you actually need less build quality to keep it reliable, versus a computer with double or triple the heat and a HDD.
this is how we got netbooks.

netbooks already existed before netbooks, but there were $3000 Sony machines or something. then Asus came and made us realize smaller computers are cheaper.
if anything the extra slim laptops have less features (no ethernet port), why should they be $1000.

as for a one module bulldozer, I find it good. it means multitasking when the browser is using 100% of one core with runaway javascript. pair it with 8GB ram and you can throw anything at it.
 
Looking specifically at the Civilization V gaming performance, am I right assuming that Trinity's GPU is severally starved by memory bandwidth?
Look, the only game that it is quite ahead is the game that uses some kind of texture compression mechanism, which in turn is much faster even on Llano than on HD 4000:
http://www.anandtech.com/Show/Index...lug=the-intel-ivy-bridge-core-i7-3770k-review
I don't think you can conclude much from Civ5. Driver implementation can make a huge difference in this game. Plus, lots of games use texture compression. Civ5's implementation seems to be different than most though.
 
Back
Top