There's an ARMv8 document out describing AArch64 in detail. You need an account on ARM to download it:
https://silver.arm.com/download/download.tm?pv=1199137
Some highlights are:
- Load/store immediate offsets can have implicit scaling by the operand size in order to extend effective range (like in Thumb)
- There's still special-second-op: you can apply an immediate shift and ror for non add/sub. add/sub/cmp also have a form allowing sign extension plus a left shift of 1 to 4 bits.
- Immediates are not the classic 8-bit + ROR, but something varying per-instruction: arithmetic have 12-bit with optional shift by 12 (so you can double up to get 24-bit easily), logical has bit-mask forms, and there's the 16-bit movt/movw-style instructions, but extended to provide the other parts for 64-bit immediate creation
- No more integer SIMD instructions (ie ARMv6 SIMD), these are instead rightfully delegated to NEON
- CPSR access is broken up per-flag instead of one register
- MIPS-style compare + branch if zero/non-zero instructions, and test and branch instructions
- Conditional select can apply increment, negate, or invert to one of its operands
- Loads/stores have register scaling by the access size, which is a reasonable limitation vs an arbitrary shift (and finally gives us proper halfword indexes!)
- +/- 1MB PC-relative loads, PC-relative address generation, and conditional branch range. +/- 128MB unconditional branch range.
- 32 and 64-bit integer division, but no direct remainder provided
- Flags setting is still optional
- Pre-indexed and post-indexed writeback on addressing is still provided, but with immediates only, and they take away offset bits (as well as optional scaling)
- Address offsets are additive only (can use negative immediates though)
- Non-temporal hints for load/store pair instructions
- add/sub/cmp/cmn/mov can access SP (read and write), and can access with read-only if setting flags, while the logical instructions can write to SP
- Only logical AND can set condition codes
- Variable shifts can no longer exceed the word size like in classical ARM implementations, I'm sure this will introduce a subtle bug somewhere..
- Conditional compare is just for cmn and cmp
- Multiply-negate instructions
- Floating point multiply-add with four operands, I think it's implied that it's fused and there's no more chained available
- Floating point gets conditional selects too
- Lane insert/extract instructions have been added to facilitate the new NEON registers no longer being natively packed (was hoping this much would at least be true), and instructions can target top 64-bits
- SIMD multiply-add is definitely fused only now, including reciprocal approximation steps
- Vector normalization acceleration instructions
- Unsigned to signed saturated arithmetic (I actually hoped this would be there!)
- NEON now has support for scalar operations, and full horizontal min, max, and sum
- NEON table lookup extended to 4 128-bit registers (instead of 4 64-bit), meaning up to 64-wide 8-bit shuffles
- Vector floating point division
- Vector element to element move
- Vector reciprocal exponent approximation
- Vector bit-reverse
- Explicit instructions for cache and TLB management
All in all I'm quite impressed with the ISA, and would definitely not consider it a MIPS clone.