AMD: Southern Islands (7*** series) Speculation/ Rumour Thread

In GDC, Avalanche studio did a presentation focusing on GCN hardware in next gen, these are some of the most notable points:


1-Projective Textures and Cubemap lookup were handled as a single instruction via TEX hardware, now they are processed in ALUs.

Other examples :
Vertex Fetch (since a long time ago)
Projection/Cubemaps (since DX10)
Interpolators (since GCN)

A side effect of this trend is that things that previously were more or less for free could now come at a moderate cost in terms of ALU instructions.

2-The GCN architecture is a fair bit more restricted than earlier AMD hardware. The stated goal has been to reduce complexity to allow for more efficient hardware.The downside is that this in some cases leads to longer shaders than what you would see on earlier hardware.


3-The hardware can also only read a single scalar register per instruction,and not at the same time as a literal constant. This can be problematic for taking full advantage of the MAD instruction

The implication thus is naturally that you should write
shaders that fits this MAD-form to the greatest extent as possible.
The GCN architecture complicates the issue somewhat due to its more restricted instruction set. As a general rule of-thumb, writing in MAD-form is still the way to go; however, there are cases on GCN where it may not be beneficial, or even add a couple of scalar instructions.

4-Transcendentals are 1/4 rate.

5-ROPS: As hardware has gotten increasingly more powerful over the years, some parts of it has lagged behind. The number of ROPs (i.e. how many pixels we can output per clock) remains
very low. While this reflects typical use cases where the shader is reasonably long, it may limit the performance of short shaders. Unless the output format is wide, we are not even theoretically capable of using the full bandwidth available. For the HD7970 we need a 128bit format to become bandwidth bound. For the PS4 64bit would suffice.

The solution is to use a compute shader. Writing through a UAV bypasses the ROPs and goes straight to memory. This solution obviously does not apply to all sorts of rendering, for one we are skipping the entire graphics pipeline as well on which we still depend for most normal rendering. However, in the cases where it applies it can certainly result in a substantial performance increase. Cases where we are initializing textures to something else than a constant color, simple post-effects, this would be useful.



www.humus.name/Articles/Persson_LowlevelShaderOptimization.pdf
 
Holy Thread-resurrection Batman - but we didn't seem to have a product/review thread for Tahiti.

Happy Birthday, AMD Radeon HD 7970. So sad, that the driver team dropped you to legacy driver support in june, but you're still one of the best aged graphics chips of all time and now debut as a "teenager". Better still: Your rather rare 6 GByte variant even runs Cyberpunk 2077. Debut of compute-oriented throughput machines aka GCN which can still be identified in RDNA2 and CDNA2 ten years later. What a chip.

edit:
This is the best desktop graphics architecture and physical implementation ever. Some rough edges, but that's the long and short of it.
Were you already with AMD at the time? Or was that your unbiased, private opinion?
 
Last edited:
I built a new PC for my young nephews a few months ago and to keep costs down amidst the craziness of the consumer GPU market over the past 18 months I installed a Radeon R280 (refreshed 7950) I bought in 2014. For 1080p gaming it handles everything they want to play off Steam. Temperatures at load are about 25 degrees less than the GTX 980 Ti in my system.

The new Radeon control panel is a complete breath of fresh air versus the Nvidia Control Panel / GeForce Experience (seriously why does it take 5 seconds to toggle one setting change in the NV CP!) as well. Was really impressed by the performance and overall package for an architecture released in late 2011.
 
Back
Top