AMD Radeon VII Announcement and Discussion

How could he say that using compute shaders with async and RPM would be faster without having seen how fast the primitive shaders were?

By reading whitepaper and rtg documentation, and at the theoretical level, thinking that it won't be faster than compute shaders... ? I've the question in reverse, how can he says that while it's not exposed at all...
 
How could he say that using compute shaders with async and RPM would be faster without having seen how fast the primitive shaders were?
Some features may not be available with primitive shaders due to hardware or software restrictions. They may serialize in the graphics pipeline, so miss out on potentially "free" performance boosts.
 
AMD really need to work on their hotspot/junction temperature woes, reading around forums the card can hit 2Ghz comfortably if the junction temperature is under control and there are few with even 2.1-2.2Ghz overclocks with good samples and low hotspots.

But instead, some part of the chip is over 100C most of the time and can lead to crashes even at stock. I also think that it'd lead to earlier chip failures.
 
Didn't the cooler make pretty poor contact with the chip necessitating the use of a thermal pad over normal thermal paste?
 
Hotspot/junction temperature is an interesting problem I never gave much thought to - I wonder what causes it? I suppose there's no chance of public SW tools showing the value of all the individual temperature sensors and their location... :(

Looking at the GCN die shots: )

It seems quite homogeneous, arguably even more so than NVIDIA designs. So unless it's the center area for routing or the edge areas for memory that take the most power, it seems unlikely there are any small hotspots... any "hotspot" area would be quite large. I was thinking very fine-grained power gating could help with hotspots (e.g. disable CUs near hotspots until temperature normalises) but even if that was possible (I don't think AMD has indicated their architecture can currently do that), it wouldn't help with large hotspots, only small ones.

In fact it's so "regular" that all the units of a given kind are usually close to each other, so if certain units take significantly more power, that might make hotspots worse - e.g. if it was really just the CUs/ALUs that are super power heavy, then it's really the central 1/3rd of the die that would take the most power, which might be hard to dissipate. If you look at NVIDIA designs, the shader cores are distributed all the way to the edges, which again *might* hotspots slightly less of a problem.

Or, more likely, I'm massively overthinking this and the only real difference between AMD and NVIDIA here is some obscure packaging technology thing I don't know anything about... :)
 
BTW - I am always *very* wary of "stable overclocks". What, exactly, do you consider stable? How do you test that the chip produces the correct results without random errors/artifacts - not just in the paths that are typically stressed, but for all transistors? For example, let's say that for some silly reason, there was either a bit less timing slack or more chip-to-chip variability in the TMU's FP32 Filtering Units which result in small random errors but only for FP32 filtering. It's very unlikely you'd notice that, but at the extreme, it might even end up with failing WHQL tests...

I had personal experience with trying to workaround some really bizarre and nasty HW/process issues in SW for a GPU that worked fine at low frequencies/high voltages, but had some very specific parts not work correctly unless they were clocked/volted much more conservatively than the rest of the chip (which wasn't possible). This resulted in a horrible mix of SW workarounds and/or manually lowering the chip's clocks when certain features were detected as being in use by the driver.

I don't expect such nightmare-ish and extreme process issues to happen on any production PC GPU, but the point remains: "stable overclock" doesn't mean very much IMO. AMD has better tools to figure out what clocks/voltages are safe than we do... it's possible they're doing an awful job at it, but I think it's more likely they need to optimise their hardware design rather than their binning process.
 
At release Vega had dies without resin and speculation was that hotspot was due to uneven contact. But people with molded dies also got very bad hotspot temperatures compared to the core, like 30C difference. AMD sought to made amends with VII and have a graphite pad by default which is apparently better at transferring heat across the surface and for contact with uneven die and heatsink. Yet, people still get runaway junction temp over 100C, I've seen examples of people getting better temps with mounting pressure change and lapping the heatsink. Techpowerup did a washer mod that took off 10C on their junction temperature in VII review.

With boost clocks, stable overclocks are another headache, you could have the card stable in benchmarks that peg them a heavy load while it'd crash in minutes with varying load and all the clock and voltage bumps in a game.

VII also seems to have different stock voltages for different cards, here's the community database for overclocks,

https://docs.google.com/spreadsheets/d/1Iim9e_ejX3nkgxLIZ3vLu1seQ1m0lDTKUhClJpAO-Gk/edit#gid=0
 
>100 °C junction is not a runaway condition, it is, in fact, even mentioned in relation to how using junction temperature can enable higher performance in AMDs reviewer's guide for RVII.
 
I didn't mean it as a runaway condition, more like the junction temperature isn't under control leading to situations where you could have the core temperature below 70C, but the junction temperature is limiting clocks because it's touching 110C limit.
 
Radeon VII's early retirement means that AMD has now lost the only consumer card in the current lineup capable of driving 4K games with maxed out settings at a decent framerate. What that means, in my opinion, is we're not going to see a big Navi GPU anytime soon. Otherwise it would make sense for AMD to keep Radeon VII, which they're not making any good profit off of, in production just a little longer at least as a token competition for the GeForce RTX 2080.
 
Back
Top