It allows for parallelism. decoding ops is almost trivial, as it uses only a handful opcode formats. What I mean is, that you can pretty much start fetching registers without decoding the op.
Registers in ARM tend to be in the same fixed locations too. The question is, in any reasonably modern design would you actually want to start fetching operands that you might not even use?
No additional decencies like status registers or flags, rather easy out-of order/super-scalar operation.
Coprocessor registers don't count as status registers? They needn't be any more tightly integrated on ARM CPUs per se..
Flags have their positive sides too. They save read/write ports on the main register file and save on the space you need to rename into. And they save on needing to address registers for comparison on conditional branches, which take away from displacement bits.
It's a headache to have instructions that only partially update the flags though, which is probably why they rectified this for AArch64.
Usefull synchronization primitives (ll/sc) since the early nineties (I`m stuck with an recent soc using Armv5 and swp - what a horror).
I shudder to think what SoC is using ARMv5 but can be considered "recent", is it from Chinachip or something? :/ I'm not counting Atmel or Freescale products that were made forever ago but are still being sold..
No ugly do this, if some bit is set do that, then do something else depending on what register you use (thinking of arms load/store multiple, does different things depending if you read/write the PC register. different modes when you are in an ISR - worst bit-but-ing possible).
Checking if (s == 1b) && (rd == 1111b) hardly seems like a huge decoding burden. There's no special function for reading PC. You could argue MIPS has a special function for writing r0.
Don't know what you mean by different modes in an ISR, except that the register file is muxed, but it's not like the instructions do different things...
32bit isa is a subset of 64bit isa
Yeah, they could afford to do that because they underutilized the opcode space so badly.
If you look at how isas devolved (at keast internally), most converge toward MIPS-like (except with 20+ years legacy crap on top).
I don't buy this. Some have called AArch64 a MIPS clone but I think it offers plenty that MIPS fails to.
1) branch delay slots arent a problem for me, both for writing code, neither for parrallelism. They still help if you have an in-order CPU, and are easy to figure out out-of-order
They don't really help you at all if you have branch prediction, regardless of if you're in-order or not. You'll never fully utilize them and when you don't they waste code. CPU designers now tend to see them as an unfortunate consequence of the times at best and a mistake at most, but either way useless now.
3) no complex addressing modes - what for? just use 2 instructions, or an macro if you write in assembly. Neither a problem in in-order or out-of-order. Keeps the isa clean.
I think you mean three instructions, to do [reg + (reg * shift)]. You can't seriously be arguing that it doesn't matter if a sequence takes more instructions. Think of all the logic you need to try to make a sequence of three instructions dispatch in one cycle, and even if you can you still burn the fetch bandwidth and L1 space.