because you gained decoding bandwidth on a single core.Masking the other core off showed performance improvement, although it was pretty modest.
same as intel's CPU.There are issue restrictions for which EXE pipeline can do what, such as MUL and DIV, and branches can only use one pipeline.
AGLU handles only MOV R,M; and MOV R,R in PD. Yet, they are 'extra alu' when they work this way, as they require an additional MOP from the front end. And if they starve trying to parse 2 instr/cycle, I dont see how they could parse 3 or 4 (or 5 with a fused jump...). Consider AMD MOP are 'fatter' than intel ones, and many instructions decode into a single MOP for an higher bandwidth.edit: Also a lack of move elimination, which is more noticeable with the claustrophobic 2 issue slots. Later iterations of the architecture will give the AGU ports the ability to handle moves, though. Intel's design does better.
manual asm where you count the issue port, the memory stalls and decoder fetching, code aligning and memory prefetching can easily give you a +100/200% boost in critical code.The bigger problem is that general-purpose processors have trended towards being resilient enough to not require so much handholding.
@sebbbi: yeah, you are right. I was meaning exactly that... getting from 1.0 to 1.5 would give a big boost, since it's for 2 core. Once all moves can be executed from AGLU will be slightly better, since MOVs are pretty ubiquitous in code and you'd have to gain even more. So, a better front-end can surely improve quite a bit the AMD performance. Sure, it wont even catch-up Intel (and since I need single-thread performance atm, you can guess which processor I use, btw I hope they'll use fluxless soldering later for Ivy -.- ).
Last edited by a moderator: