1-Projective Textures and Cubemap lookup were handled as a single instruction via TEX hardware, now they are processed in ALUs.
Other examples :
Vertex Fetch (since a long time ago)
Projection/Cubemaps (since DX10)
Interpolators (since GCN)
A side effect of this trend is that things that previously were more or less for free could now come at a moderate cost in terms of ALU instructions.
2-The GCN architecture is a fair bit more restricted than earlier AMD hardware. The stated goal has been to reduce complexity to allow for more efficient hardware.The downside is that this in some cases leads to longer shaders than what you would see on earlier hardware.
3-The hardware can also only read a single scalar register per instruction,and not at the same time as a literal constant. This can be problematic for taking full advantage of the MAD instruction
The implication thus is naturally that you should write
shaders that fits this MAD-form to the greatest extent as possible.
The GCN architecture complicates the issue somewhat due to its more restricted instruction set. As a general rule of-thumb, writing in MAD-form is still the way to go; however, there are cases on GCN where it may not be beneficial, or even add a couple of scalar instructions.
4-Transcendentals are 1/4 rate.
5-ROPS: As hardware has gotten increasingly more powerful over the years, some parts of it has lagged behind. The number of ROPs (i.e. how many pixels we can output per clock) remains
very low. While this reflects typical use cases where the shader is reasonably long, it may limit the performance of short shaders. Unless the output format is wide, we are not even theoretically capable of using the full bandwidth available. For the HD7970 we need a 128bit format to become bandwidth bound. For the PS4 64bit would suffice.
The solution is to use a compute shader. Writing through a UAV bypasses the ROPs and goes straight to memory. This solution obviously does not apply to all sorts of rendering, for one we are skipping the entire graphics pipeline as well on which we still depend for most normal rendering. However, in the cases where it applies it can certainly result in a substantial performance increase. Cases where we are initializing textures to something else than a constant color, simple post-effects, this would be useful.