AMD Vega 10, Vega 11, Vega 12 and Vega 20 Rumors and Discussion

I wonder whether a few on-package GDDR5(X)/6 chips on an APU might make sense with HBCC. I mean, with just a couple of fast GDDR chips, you could get over 100GB/s, and presumably 2GB.
You'd need 3 chips / 96-bit membus with GDDR5X to reach over 100 GB/s
Would it really be notably cheaper than sticking one 4-Hi HBM2-stack there, which would offer over twice the bandwidth to boot?
 
I wonder whether a few on-package GDDR5(X)/6 chips on an APU might make sense with HBCC. I mean, with just a couple of fast GDDR chips, you could get over 100GB/s, and presumably 2GB.
What about a single HBM1 stack? If SK Hynix is still making them and Fiji is to be replaced with Vega on all fronts, those single stacks could be cheap enough by now.
 
You'd need 3 chips / 96-bit membus with GDDR5X to reach over 100 GB/s
Would it really be notably cheaper than sticking one 4-Hi HBM2-stack there, which would offer over twice the bandwidth to boot?

Yeah but two 14Gbps GDDR6 chips would provide 112GB/s, which is well over twice the bandwidth provided by a dual-channel DDR4-3200 setup. It's hard to estimate the cost but my guess would be that this would be cheaper than a stack of HBM2. I doubt there's any point in using HBM1 at this point, the low volumes probably make it unattractive from a cost perspective.
 
You'd need 3 chips / 96-bit membus with GDDR5X to reach over 100 GB/s
Would it really be notably cheaper than sticking one 4-Hi HBM2-stack there, which would offer over twice the bandwidth to boot?
I'd even question if a 2-Hi stack was an option. Should still offer full bandwidth, but cheaper and more than sufficient for low end boxes. However 8-Hi might be "cheaper" if it allows doing away with system memory.

Yeah but two 14Gbps GDDR6 chips would provide 112GB/s, which is well over twice the bandwidth provided by a dual-channel DDR4-3200 setup. It's hard to estimate the cost but my guess would be that this would be cheaper than a stack of HBM2. I doubt there's any point in using HBM1 at this point, the low volumes probably make it unattractive from a cost perspective.
Possibly, but the benefits of more bandwidth for an APU may be worth the trade-off at the bottom of the market. Higher up, HBM superior in capacity and both of those configurations would be interchangeable.
 
I'd even question if a 2-Hi stack was an option. Should still offer full bandwidth, but cheaper and more than sufficient for low end boxes. However 8-Hi might be "cheaper" if it allows doing away with system memory.

Possibly, but the benefits of more bandwidth for an APU may be worth the trade-off at the bottom of the market. Higher up, HBM superior in capacity and both of those configurations would be interchangeable.

You know, on second thought
  • Vega 10 has about 13000 GFLOPS and 480GB/s; that's about 27 FLOP/B,
  • Raven Ridge (11 CUs, maybe 1GHz) should have about 1400 GFLOPS and perhaps 51.2GB/s (DDR4-3200); that's about 27 FLOB/B too.
So it might not be in such dire need of more bandwidth after all.
 
You know, on second thought
  • Vega 10 has about 13000 GFLOPS and 480GB/s; that's about 27 FLOP/B,
  • Raven Ridge (11 CUs, maybe 1GHz) should have about 1400 GFLOPS and perhaps 51.2GB/s (DDR4-3200); that's about 27 FLOB/B too.
So it might not be in such dire need of more bandwidth after all.
Vega10 also has 45MB of SRAM that probably reduce that need a bit as cache size doesn't scale. Also need to consider the CPU still has its own work and there could be bandwidth intensive applications that would benefit. Power would be another consideration as HBM would use less energy than DDR4. It's not a bad idea, but HBM would have been designed for this usage.
 
If it's the same architecture, I doubt the iGPU in Raven Ridge will clock only at 1GHz. I think 1.15 to1.2GHz is a more believable baseline.

Except of course for the lower power mobile versions, in which case the APU won't be using DDR4 3200 anyways.
 
I guess so yes. It's a pretty sad result considering the delay, power usage&co, but, they have no choice to release that I guess. I hope some Vega tech will be used in Navi, otherwise it would be such a waste.
 
RX Vega TimeSpy GPU scores:

1630MHz: 7209
1750MHz: 7580
GTX 1080 stock : ~7400

Driver version is new : 22.19.666.1

http://www.3dmark.com/compare/spy/2192756
http://www.3dmark.com/compare/spy/2192971
Mind you, both of those are DPM7 clocks of Air and Water respectively. We don't know if it's actually sustaining it.

Anyway, I went ahead and looked at the score split:
1080 Ti is 17% faster in graphics test 1, and 33% faster in graphics test 2

From 3DMark technical guide:
Graphics test 1 focuses more on rendering of transparent elements. It utilizes the A-buffer heavily to render transparent geometries and big particles in an order-independent manner. Graphics test 1 draws particle shadows for selected light sources. Ray-marched volumetric illumination is enabled only for the directional light. All post-processing effects are enabled.

Graphics test 2 focuses more on ray-marched volume illumination with hundreds of shadowed and unshadowed spot lights. The A-buffer is used to render glass sheets in an order-independent manner. Also, lots of small particles are simulated and drawn into the A-buffer. All post-processing effects are enabled.

V0rrtFt.png


Speculate as you will.
 
I have a simple question.
When discussing "bandwidth", what components within a GPU, does bandwidth matter back & forth ? At such bit depth.

Memory?
 
RX Vega TimeSpy GPU scores:

1630MHz: 7209
1750MHz: 7580
GTX 1080 stock : ~7400

Driver version is new : 22.19.666.1

http://www.3dmark.com/compare/spy/2192756
http://www.3dmark.com/compare/spy/2192971

Well, seeing as an air cooled Vega FE scored 7,126 on June 28 using driver 22.19.384.2, my inclination is to suspect old drivers because the alternative would require believing that enabling AVFS and DSBR in RX Vega's drivers is worth exactly ~0%, and that nothing has been achieved in terms of correcting the memory bandwidth issues on Vega FE, but that would seem to be incompatible with the ETH hashrate rumours.
 
1750MHz would be the water-cooled version, correct?
Yes Indeed.
Well, seeing as an air cooled Vega FE scored 7,126 on June 28 using driver 22.19.384.2, my inclination is to suspect old drivers
As previously stated by AMD to PCPer and GamersNexus, Vega FE driver already has all the gaming optimizations until it's release. So RX driver could've really had almost nothing new to add to the table in that regard.
the alternative would require believing that enabling AVFS and DSBR in RX Vega's drivers is worth exactly ~0%,
AMD already implied not to expect great differences due to the activation of DSBR.
nothing has been achieved in terms of correcting the memory bandwidth issues on Vega FE
Who said memory bandwidth needed a driver to be corrected?
 
So basically no FP64 on the Instinct MI25 accelerators?

As is always the case for such accelerator-based architectures, the majority of the flops are supplied by the GPUs. In this case, each MI25 delivers 12.3 teraflops of single precision floating point (FP32) or 24.6 teraflops of half precision (FP16). Together they account for more than 95 percent of the system’s floating point computational power.

One of the main advantages of the Project 47 machine is that it is able to deliver a lot of floating point horsepower within a relatively small power envelope. AMD is claiming the system delivers 30 gigaflops per watt of FP32 operations, which would put it at or near the top of the Green500 list if somehow those FP32 operations could be transformed to FP64. Alas, these latest Radeon parts have little 64-bit capability, making the comparison somewhat irrelevant. The current Green500 champ is TSUBAME 3.0, which turned in a power efficiency of 14.1 gigaflops of performance based on (FP64) Linpack.
https://www.top500.org/news/amd-demos-petaflop-in-a-rack-supercomputer/
 
Back
Top