What's the source of this list?
I am a bit disappointed if the "Primitive Discard Accelerator" turned out to be a marketing name for conservative rasterization. This naming doesn't make much sense. I was hoping that AMD takes a big leap forward in their primitive processing performance. Nvidia is much faster rejecting invisible triangles.
Hopefully "Memory Compression" means that writes from compute shaders also support delta compression. It would be awesome. I already have some ideas how to exploit that (in sparse data structures).
I actually had a conversation regarding to instruction pre-fetch just a few weeks ago in Twitter. I didn't find any public documents describing GCN (1.0-1.2) instruction pre-fetch. This is important to know if you want to build a "jump table" style shader system. Sort all shaders by GPR count and bucket them to GCN occupancy classes (
http://amd-dev.wpengine.netdna-cdn.com/wordpress/media/2014/05/gcn_vgpr_table.png). This results in 10 different shaders in total, allowing you to execute any amount of different shaders in just 10 compute dispatches. Nice trick for tiled/clustered lighting for example (especially when combined with deferred texturing). Simple pre-fetch (load first N instructions of each shader) is not a good idea for shaders like this. Pre-fetch after the jump would of course work just fine.