AMD Radeon VII Announcement and Discussion

Rootax · Feb 19, 2019

ToTTenTranz said:
How could he say that using compute shaders with async and RPM would be faster without having seen how fast the primitive shaders were?

By reading whitepaper and rtg documentation, and at the theoretical level, thinking that it won't be faster than compute shaders... ? I've the question in reverse, how can he says that while it's not exposed at all...

Anarchist4000 · Feb 27, 2019

ToTTenTranz said:
How could he say that using compute shaders with async and RPM would be faster without having seen how fast the primitive shaders were?

Some features may not be available with primitive shaders due to hardware or software restrictions. They may serialize in the graphics pipeline, so miss out on potentially "free" performance boosts.

gamervivek · Mar 7, 2019

AMD really need to work on their hotspot/junction temperature woes, reading around forums the card can hit 2Ghz comfortably if the junction temperature is under control and there are few with even 2.1-2.2Ghz overclocks with good samples and low hotspots.

But instead, some part of the chip is over 100C most of the time and can lead to crashes even at stock. I also think that it'd lead to earlier chip failures.

snarfbot · Mar 7, 2019

Didn't the cooler make pretty poor contact with the chip necessitating the use of a thermal pad over normal thermal paste?

Arun · Mar 7, 2019

Hotspot/junction temperature is an interesting problem I never gave much thought to - I wonder what causes it? I suppose there's no chance of public SW tools showing the value of all the individual temperature sensors and their location...

Looking at the GCN die shots:

https://flic.kr/p/46202429935

)

It seems quite homogeneous, arguably even more so than NVIDIA designs. So unless it's the center area for routing or the edge areas for memory that take the most power, it seems unlikely there are any small hotspots... any "hotspot" area would be quite large. I was thinking very fine-grained power gating could help with hotspots (e.g. disable CUs near hotspots until temperature normalises) but even if that was possible (I don't think AMD has indicated their architecture can currently do that), it wouldn't help with large hotspots, only small ones.

In fact it's so "regular" that all the units of a given kind are usually close to each other, so if certain units take significantly more power, that might make hotspots worse - e.g. if it was really just the CUs/ALUs that are super power heavy, then it's really the central 1/3rd of the die that would take the most power, which might be hard to dissipate. If you look at NVIDIA designs, the shader cores are distributed all the way to the edges, which again *might* hotspots slightly less of a problem.

Or, more likely, I'm massively overthinking this and the only real difference between AMD and NVIDIA here is some obscure packaging technology thing I don't know anything about...

Arun · Mar 7, 2019

BTW - I am always *very* wary of "stable overclocks". What, exactly, do you consider stable? How do you test that the chip produces the correct results without random errors/artifacts - not just in the paths that are typically stressed, but for all transistors? For example, let's say that for some silly reason, there was either a bit less timing slack or more chip-to-chip variability in the TMU's FP32 Filtering Units which result in small random errors but only for FP32 filtering. It's very unlikely you'd notice that, but at the extreme, it might even end up with failing WHQL tests...

I had personal experience with trying to workaround some really bizarre and nasty HW/process issues in SW for a GPU that worked fine at low frequencies/high voltages, but had some very specific parts not work correctly unless they were clocked/volted much more conservatively than the rest of the chip (which wasn't possible). This resulted in a horrible mix of SW workarounds and/or manually lowering the chip's clocks when certain features were detected as being in use by the driver.

I don't expect such nightmare-ish and extreme process issues to happen on any production PC GPU, but the point remains: "stable overclock" doesn't mean very much IMO. AMD has better tools to figure out what clocks/voltages are safe than we do... it's possible they're doing an awful job at it, but I think it's more likely they need to optimise their hardware design rather than their binning process.

gamervivek · Mar 9, 2019

At release Vega had dies without resin and speculation was that hotspot was due to uneven contact. But people with molded dies also got very bad hotspot temperatures compared to the core, like 30C difference. AMD sought to made amends with VII and have a graphite pad by default which is apparently better at transferring heat across the surface and for contact with uneven die and heatsink. Yet, people still get runaway junction temp over 100C, I've seen examples of people getting better temps with mounting pressure change and lapping the heatsink. Techpowerup did a washer mod that took off 10C on their junction temperature in VII review.

With boost clocks, stable overclocks are another headache, you could have the card stable in benchmarks that peg them a heavy load while it'd crash in minutes with varying load and all the clock and voltage bumps in a game.

VII also seems to have different stock voltages for different cards, here's the community database for overclocks,

https://docs.google.com/spreadsheets/d/1Iim9e_ejX3nkgxLIZ3vLu1seQ1m0lDTKUhClJpAO-Gk/edit#gid=0

CarstenS · Mar 9, 2019

>100 °C junction is not a runaway condition, it is, in fact, even mentioned in relation to how using junction temperature can enable higher performance in AMDs reviewer's guide for RVII.

gamervivek · Mar 10, 2019

I didn't mean it as a runaway condition, more like the junction temperature isn't under control leading to situations where you could have the core temperature below 70C, but the junction temperature is limiting clocks because it's touching 110C limit.

gamervivek · Mar 12, 2019

Mine became unstable and hard locked my PC so I lapped mine and the temps came down 40'c, Not even kidding, Prior to lapping, The cold plate didn't even touch the majority of the die, Whoever AMD contracted to make the heatsink did a horrendous job.

https://forums.overclockers.co.uk/posts/32571168/

A1xLLcqAgt0qc2RyMz0y · Jul 12, 2019

AMD Radeon VII reaches end of life (EOL)

https://www.fudzilla.com/news/graphics/49039-amd-radeon-vii-reaches-end-of-life-eol

Malo · Jul 12, 2019

Shortest lifespan for a GPU ever?

no-X · Jul 12, 2019

GeForce FX 5800 Ultra - 4.5 months on market
GeForce 6800 Ultra Extreme - reviewed and cancelled / EOLed
GeForce 7800 GTX 512 - 4 months on market, never widely available

ninelven · Jul 12, 2019

3dfx Voodoo 5 6000.

Malo · Jul 12, 2019

ninelven said:
3dfx Voodoo 5 6000.

Does that actually count? It was never released?

DavidGraham · Jul 12, 2019

Malo said:
Shortest lifespan for a GPU ever?

Maybe Jensen was right about this one being a marketing stunt, cooked up at the last minute?

Malo · Jul 12, 2019

DavidGraham said:
Maybe Jensen was right about this one being a marketing stunt, cooked up at the last minute?

I think that was fairly obvious to everyone when it was released that it didn't have much of a market and wouldn't be around long. I don't believe it was necessary to plug your beloved Jensen.

DavidGraham · Jul 12, 2019

Malo said:
I don't believe it was necessary to plug your beloved Jensen.

Nah, it was fun ..

digitalwanderer · Jul 12, 2019

Well I for one am a tad bit disappointed.

Betonmischer · Jul 13, 2019

Radeon VII's early retirement means that AMD has now lost the only consumer card in the current lineup capable of driving 4K games with maxed out settings at a decent framerate. What that means, in my opinion, is we're not going to see a big Navi GPU anytime soon. Otherwise it would make sense for AMD to keep Radeon VII, which they're not making any good profit off of, in production just a little longer at least as a token competition for the GeForce RTX 2080.

AMD Radeon VII Announcement and Discussion

Rootax

Anarchist4000

gamervivek

snarfbot

Arun

Unknown.

Arun

Unknown.

gamervivek

CarstenS

Moderator

gamervivek

gamervivek

A1xLLcqAgt0qc2RyMz0y

Malo

Yak Mechanicum

no-X

ninelven

PM

Malo

Yak Mechanicum

DavidGraham

Malo

Yak Mechanicum

DavidGraham

digitalwanderer

Betonmischer

Similar threads