Off-topic ftw
Guden Oden said:
Impressive! Well, at least it would have been if I'd been able to interpret ARM assembly opcodes!
Perhaps if you did some comparison of some other CPU architecture(s), x86, or MIPS (since PSP is MIPS-based), we'd (even those perhaps that like me lack most in the way of coding skillzz) would get a greater understanding of what makes ARM's implementation so powerful.
First off, it is very fast per clock but that's an opportunity ARM had because they didn't aim for very high clocks anyway. I don't know how far it would go, but to give at least some perspective the particular ARM core we're talking about runs at 33MHz in the DS. You won't see a single-cycle integer multiplication instruction in an architecture that needs to hit GHz clocks. Calling it "so powerful" is okay, but keep the overall design targets in mind when you do
The opcodes are almost straight-forward if you've ever seen assembly listings.
ARM opcodes follow the usual "RISC" conventions. Each opcode has a destination register and source registers. For an addition you have this basic form:
add result_register,source_register1,source_register2
Registers are named r0,r1 ... to r15. So this is an actual ARM instruction:
add r1,r2,r3
This will add the integers that are currently in r2 and r3 and place the result in r1. It's standard procedure for most "RISC" architectures, including MIPS I believe.
Now for the bit-shifting, they apparently found that they had so many free bits in the instruction encoding (each ARM instruction is 32 bits wide), that they started blowing them on operand modifiers. Almost every ARM instruction allows an extended form of the second operand, instead of just the contents of a register. You can use:
a)The contents of a register shifted left or right, or rotated, by a 5-bit immediate value (that is a value embedded into a field of the instruction itself).
b)"" "" by a value sourced from another register (aka "register controlled shift")
c)The contents of a register rotated right by one, with the added side effect that the bit "rotated out" is placed into the carry flag ...
d)An eight-bit immediate value that can be rotated (e.g. you can express 0xF000000F as 0xFF rotated right by four bits, and thus it's a valid "eight bit" immediate value).
In "add r1,r1,r2,lsl #8", which is an instruction from the listing I gave in the last post, you have an example of this extended second operand. The instruction adds the current contents of r2 shifted left by eight bits to r1 and places the result back in r1. "lsl" as in
logical
shift
left.
This can be done in almost any instruction. Exceptions are, curiosly, multiply, understandably multiply-accumulate (this one has three source registers, so they need more bits to encode that), loads and stores which have their own address-generation rules and some system stuff (coprocessor interfacing, software interrupts, the like).
It goes so far that they actually don't have a free-standing bit-shift instruction. If you only want to shift a register around, you must use a "dummy" move instruction (all archs have this in some form or another, but it usually only copies a register's contents to another register).
The competitive "problem" with the multiply instruction, when compared to the extreme shift capabilites, is that it does not allow an immediate operand. I.e. as seen above, if you want to multiply a number by 240, you can't use this:
mul r1,r2,#240
The number 240 must be moved to a temporary register first.
mov r1,#240
mul r1,r2,r1
=====================
The other instructions in the snippet were rsb and strh.
RSB is
reverse
su
btract, i.e. it subtracts the first operand
from the second operand, which is useful because only the second source operand may have the modifiers outlined above. The "normal" subtraction is simply called sub, and of course works the other way 'round.
STRH
sto
res a
halfword to memory, and can also compute a semi-complex address as a base address+shifted offset. An ARMish "halfword" is 16 bits, because an ARMish word is 32 bits, which is in contrast to x86 circles, where a word is historically 16 bits and a 32 bit quantity is called a double-word.
If it wasn't obvious, the "at"-sign starts comments in ARM assembly source code, so whatever follows after these have been my attempts at explaining what's going on and where.
=====================
TBH I don't know much about MIPS. However, I know a bit about x86
The x86 instruction set is very ... relaxing, because they didn't have to care much about how a certain amount of information can be packed into a fixed-length instruction encoding. Instructions have variable length, so no worries at all. You can have wide immediates everywhere for one, and you totally can do this:
imul eax,eax,240
(integer-multiply eax by 240, write result back to eax)
There's no multiply-accumulate instruction in x86.
x86 has free-standing bit-shift instructions with immediate or register-controlled shift values.
SHL eax,4
This shifts the value in eax by four bits to the left.
x86 doesn't usually allow separate source and destination registers. I.e. usually one of the source operands is overwritten with the results, this is also true for the shift operation. Multiplication is just about the only case where this rule can be broken.
x86
does allow inline shifts in address generation, but nowhere else.
mov [edi+4*ecx],eax
The above excercise in x86 would look something like this:
Code:
; common assumptions about register contents:
; ax: color to write
; ebx: x, ecx: y
; edi: pointer to start of first line in framebuffer
; line width:=240 pixels=480 bytes
; using shifts
shl ecx,4 ; ecx:=16*y
mov edx,ecx ; copy 16*y
shl ecx,4 ; ecx:=256*y
sub ecx,edx ; ecx:=256*y-16*y = 240*y
add ecx,ebx ; ebx:=240*y+x
mov word [edi+2*ecx],ax ;store ax at fb+2*(240*y+x)
Code:
; using multiply
imul ecx,240 ; ecx:=240*y
add ecx,ebx ; ebx:=240*y+x
mov word [edi+2*ecx],ax ;store ax at fb+2*(240*y+x)
Done