AMD Vega Hardware Reviews

Correct me if I'm wrong, but didn't NVIDIA go to software scheduling already with Kepler?
Different aspect of software scheduling, but yes it's what they have been doing. This would be the compiler generating short instruction sequences to use temporary registers. The register file cache if you will. Problematic if GCN did it as each subsequent instruction is a different wave so temporary registers would be filled. It would require each wave to run a handful, or at least until it stalled, of instructions prior to the next wave scheduling. A matrix multiplication for example being a commonly repeated set of instructions with a lot of data sharing.

http://videocardz.com/71280/amd-vega-10-vega-11-vega-12-and-vega-20-confirmed-by-eec
 

Lots of news coming from this. According to this certification list, it looks like all other Vega SKUs may be a lot closer than we thought.
There is:

Vega 11 XT
Vega 11 XL
Vega 12 XT
Dual-Vega 10 card for the Instinct series
Vega 20 - but only for Instinct and Pro series, meaning it could be a GV100 competitor not available for consumers.

Vega 11 sounds like it could be the replacement for Polaris 10/20, if they manage to develop an interposer with GPU + single HBM stack for similar costs as GPU + 8*32bit GDDR5 lanes. Since the cards in that performance range are highly inflated in price because of mining, they could get away with selling $300 cards with a small performance boost over Polaris 20. 32 NCUs @ 1.5GHz would probably do well enough.
Vega 12 might be mostly exclusive to laptops. Probably for macbook refreshes first and 4 months later to the pleb.



EDIT: this obviously belongs to the Vega rumors thread. If a mod would be so kind to transfer the post there, or maybe I can just repost there.
 
Last edited by a moderator:
Ok, am I an idiot for being excited about the new enhanced v-sync? One that works without a freesync monitor? THAT freaking excites the hell out of me! :D
 
Not necessarily, but you could be if you only bought a Vega because of that single feature (instead of a cheaper Polaris-based card for which it is also enabled - which is nice in itself).
 
Ok, am I an idiot for being excited about the new enhanced v-sync? One that works without a freesync monitor? THAT freaking excites the hell out of me! :D

You're not an idiot, unless you actually have extensively used a Freesync monitor and get to know how spectacularly cool it is to have a game that practically feels like 60FPS to us (non-esport gods) mortals, even though the actual framerate is navigating between ~45 and 55 FPS. And all of this without tearing.

For mGPU users (didn't you have a Fiji Pro Duo?), which tend to get more dips than most, it's a real game-changer.
 
As someone with a 34" Ultrawide non-freesync monitor and likely buying a Vega, it excites me as well.
 
Vega 20 - but only for Instinct and Pro series, meaning it could be a GV100 competitor not available for consumers.

Vega 11 sounds like it could be the replacement for Polaris 10/20, if they manage to develop an interposer with GPU + single HBM stack for similar costs as GPU + 8*32bit GDDR5 lanes. Since the cards in that performance range are highly inflated in price because of mining, they could get away with selling $300 cards with a small performance boost over Polaris 20. 32 NCUs @ 1.5GHz would probably do well enough.
Vega 12 might be mostly exclusive to laptops. Probably for macbook refreshes first and 4 months later to the pleb.
Yea, this pretty much aligns with the previous slides: https://videocardz.com/65521/amd-vega-10-and-vega-20-slides-revealed
 
Nope, I guess if one wanted to nitpick, Nvidia gives the end user the option of whether or not to disable sync below the refresh, whereas AMD just does it for you. But if one is using Fast/Enhanced Sync anyway, the entire point is the lower latency, so it really doesn't make sense otherwise (unless one just absolutely can't stand any tearing ever).
 
I just realized something. It would be trivial for NVIDIA to release a GTX1075 (basically the GTX1070M SM configuration + a minor clock bump) at $350 and absolutely ruin Vega. I see no reason for them for them not to do this.
 
Different aspect of software scheduling, but yes it's what they have been doing. This would be the compiler generating short instruction sequences to use temporary registers. The register file cache if you will. Problematic if GCN did it as each subsequent instruction is a different wave so temporary registers would be filled.
Huh.

A wavefront will run on a GCN SIMD until it reaches some kind of barrier (flow control, memory access, explicit barrier...) So this can easily be hundreds of instructions all from a single wavefront running contiguously.

The only meaningful exception is when the instructions won't all fit in instruction cache.
 
I just realized something. It would be trivial for NVIDIA to release a GTX1075 (basically the GTX1070M SM configuration + a minor clock bump) at $350 and absolutely ruin Vega. I see no reason for them for them not to do this.
What that sounds like is the perfect mining card that no consumer would ever get their hands on.

AMD on the other hand would be delighted to see the RX580/570 back in the shelves for gamers.
 
Ok, am I an idiot for being excited about the new enhanced v-sync? One that works without a freesync monitor? THAT freaking excites the hell out of me! :D

I pointed it out earlier, but EnhancedSync works with Freesync, its not any kind of replacement

EnhancedSync = active when above max refresh rate

FreeSync = active when below max refresh rate

Nope, I guess if one wanted to nitpick, Nvidia gives the end user the option of whether or not to disable sync below the refresh, whereas AMD just does it for you. But if one is using Fast/Enhanced Sync anyway, the entire point is the lower latency, so it really doesn't make sense otherwise (unless one just absolutely can't stand any tearing ever).

FastSync will cause the same problems as VSync when under refresh rate, so you'd want to turn it off with Adaptive-VSync enabled. EnhancedSync just combines the two into one setting.
 
I pointed it out earlier, but EnhancedSync works with Freesync, its not any kind of replacement

EnhancedSync = active when above max refresh rate

FreeSync = active when below max refresh rate
He never said anything about it replacing Freesync, only that it's a feature that doesn't require a Freesync monitor.
 
He never said anything about it replacing Freesync, only that it's a feature that doesn't require a Freesync monitor.

My point was it won't make his monitor act like a Freesync one, because it has a completely different purpose (above max refresh vs under).
 
A wavefront will run on a GCN SIMD until it reaches some kind of barrier (flow control, memory access, explicit barrier...) So this can easily be hundreds of instructions all from a single wavefront running contiguously.

The only meaningful exception is when the instructions won't all fit in instruction cache.
Hmm. I thought it was a 4 cycle cadence with round robin (exception for priorities) through all the waves (or at least the group) on a SIMD as a wave couldn't schedule back to back except under certain circumstances? I have no doubt what you suggest would be a bit more efficient as you could carry the output, but my understanding was everything gets written out each cycle of the cadence and that ability largely unused. Exception of a few complex instructions. All waves in a group attempting to stay relatively in sync, reducing the burden on the instruction cache. The "barriers" only preventing a wave from scheduling within that rotation. Nvidia doing what you described.
 
Huh.

A wavefront will run on a GCN SIMD until it reaches some kind of barrier (flow control, memory access, explicit barrier...) So this can easily be hundreds of instructions all from a single wavefront running contiguously.

The only meaningful exception is when the instructions won't all fit in instruction cache.

It wouldn't take too long before the per-wave instruction buffer is empty, and the fetch process is subject to variable latencies and arbitration for instruction fetch. AMD indicated age, utilization, and priority could factor into which buffer is granted a fetch in a given cycle.
Perhaps the arbitration factor can be reduced if it's one wavefront per SIMD, and the L1 is no thrashed. Sharing within a CU may be insufficient since the L1 instruction cache is shared between multiple CUs.
 
BacBeyond said:
FastSync will cause the same problems as VSync when under refresh rate, so you'd want to turn it off with Adaptive-VSync enabled. EnhancedSync just combines the two into one setting.
Yes.... that is exactly what I wrote in the first place.....
 
Back
Top