AMD Vega 10, Vega 11, Vega 12 and Vega 20 Rumors and Discussion

Kaotik · Aug 8, 2017

Alexko said:
I wonder whether a few on-package GDDR5(X)/6 chips on an APU might make sense with HBCC. I mean, with just a couple of fast GDDR chips, you could get over 100GB/s, and presumably 2GB.

You'd need 3 chips / 96-bit membus with GDDR5X to reach over 100 GB/s
Would it really be notably cheaper than sticking one 4-Hi HBM2-stack there, which would offer over twice the bandwidth to boot?

Deleted member 13524 · Aug 8, 2017

Alexko said:
I wonder whether a few on-package GDDR5(X)/6 chips on an APU might make sense with HBCC. I mean, with just a couple of fast GDDR chips, you could get over 100GB/s, and presumably 2GB.

What about a single HBM1 stack? If SK Hynix is still making them and Fiji is to be replaced with Vega on all fronts, those single stacks could be cheap enough by now.

Alexko · Aug 8, 2017

Kaotik said:
You'd need 3 chips / 96-bit membus with GDDR5X to reach over 100 GB/s
Would it really be notably cheaper than sticking one 4-Hi HBM2-stack there, which would offer over twice the bandwidth to boot?

Yeah but two 14Gbps GDDR6 chips would provide 112GB/s, which is well over twice the bandwidth provided by a dual-channel DDR4-3200 setup. It's hard to estimate the cost but my guess would be that this would be cheaper than a stack of HBM2. I doubt there's any point in using HBM1 at this point, the low volumes probably make it unattractive from a cost perspective.

Anarchist4000 · Aug 8, 2017

Kaotik said:
You'd need 3 chips / 96-bit membus with GDDR5X to reach over 100 GB/s
Would it really be notably cheaper than sticking one 4-Hi HBM2-stack there, which would offer over twice the bandwidth to boot?

I'd even question if a 2-Hi stack was an option. Should still offer full bandwidth, but cheaper and more than sufficient for low end boxes. However 8-Hi might be "cheaper" if it allows doing away with system memory.

Alexko said:
Yeah but two 14Gbps GDDR6 chips would provide 112GB/s, which is well over twice the bandwidth provided by a dual-channel DDR4-3200 setup. It's hard to estimate the cost but my guess would be that this would be cheaper than a stack of HBM2. I doubt there's any point in using HBM1 at this point, the low volumes probably make it unattractive from a cost perspective.

Possibly, but the benefits of more bandwidth for an APU may be worth the trade-off at the bottom of the market. Higher up, HBM superior in capacity and both of those configurations would be interchangeable.

Alexko · Aug 9, 2017

Anarchist4000 said:
I'd even question if a 2-Hi stack was an option. Should still offer full bandwidth, but cheaper and more than sufficient for low end boxes. However 8-Hi might be "cheaper" if it allows doing away with system memory.

Possibly, but the benefits of more bandwidth for an APU may be worth the trade-off at the bottom of the market. Higher up, HBM superior in capacity and both of those configurations would be interchangeable.

You know, on second thought

Vega 10 has about 13000 GFLOPS and 480GB/s; that's about 27 FLOP/B,
Raven Ridge (11 CUs, maybe 1GHz) should have about 1400 GFLOPS and perhaps 51.2GB/s (DDR4-3200); that's about 27 FLOB/B too.

So it might not be in such dire need of more bandwidth after all.

Anarchist4000 · Aug 9, 2017

Alexko said:
You know, on second thought

Vega 10 has about 13000 GFLOPS and 480GB/s; that's about 27 FLOP/B,

Raven Ridge (11 CUs, maybe 1GHz) should have about 1400 GFLOPS and perhaps 51.2GB/s (DDR4-3200); that's about 27 FLOB/B too.

So it might not be in such dire need of more bandwidth after all.

Vega10 also has 45MB of SRAM that probably reduce that need a bit as cache size doesn't scale. Also need to consider the CPU still has its own work and there could be bandwidth intensive applications that would benefit. Power would be another consideration as HBM would use less energy than DDR4. It's not a bad idea, but HBM would have been designed for this usage.

Deleted member 13524 · Aug 9, 2017

If it's the same architecture, I doubt the iGPU in Raven Ridge will clock only at 1GHz. I think 1.15 to1.2GHz is a more believable baseline.

Except of course for the lower power mobile versions, in which case the APU won't be using DDR4 3200 anyways.

Arnold Beckenbauer · Aug 9, 2017

Ryzen 7 2700U versus Intel i5-7260U with Iris 640
https://gfxbench.com/compare.jsp?be...+i5-7260U+CPU+with+Iris(TM)+Plus+Graphics+640

And
Ryzen 7 2700U versus A12-9800E (desktop CPU with TDP of 35W)
https://gfxbench.com/compare.jsp?be...D+A12-9800E+RADEON+R7,+12+COMPUTE+CORES+4C+8G

DavidGraham · Aug 9, 2017

RX Vega TimeSpy GPU scores:

1630MHz: 7209
1750MHz: 7580
GTX 1080 stock : ~7400

Driver version is new : 22.19.666.1

http://www.3dmark.com/compare/spy/2192756
http://www.3dmark.com/compare/spy/2192971

Scott_Arm · Aug 9, 2017

DavidGraham said:
RX Vega TimeSpy GPU scores:

1630MHz: 7209
1750MHz: 7580
GTX 1080 stock : ~7400

Driver version is new : 22.19.666.1

http://www.3dmark.com/compare/spy/2192756
http://www.3dmark.com/compare/spy/2192971

1750MHz would be the water-cooled version, correct?

Rootax · Aug 9, 2017

I guess so yes. It's a pretty sad result considering the delay, power usage&co, but, they have no choice to release that I guess. I hope some Vega tech will be used in Navi, otherwise it would be such a waste.

Cat Merc · Aug 9, 2017

DavidGraham said:
RX Vega TimeSpy GPU scores:

1630MHz: 7209
1750MHz: 7580
GTX 1080 stock : ~7400

Driver version is new : 22.19.666.1

http://www.3dmark.com/compare/spy/2192756
http://www.3dmark.com/compare/spy/2192971

Mind you, both of those are DPM7 clocks of Air and Water respectively. We don't know if it's actually sustaining it.

Anyway, I went ahead and looked at the score split:
1080 Ti is 17% faster in graphics test 1, and 33% faster in graphics test 2

From 3DMark technical guide:

Graphics test 1 focuses more on rendering of transparent elements. It utilizes the A-buffer heavily to render transparent geometries and big particles in an order-independent manner. Graphics test 1 draws particle shadows for selected light sources. Ray-marched volumetric illumination is enabled only for the directional light. All post-processing effects are enabled.

Graphics test 2 focuses more on ray-marched volume illumination with hundreds of shadowed and unshadowed spot lights. The A-buffer is used to render glass sheets in an order-independent manner. Also, lots of small particles are simulated and drawn into the A-buffer. All post-processing effects are enabled.

Speculate as you will.

w0lfram · Aug 9, 2017

I have a simple question.
When discussing "bandwidth", what components within a GPU, does bandwidth matter back & forth ? At such bit depth.

Memory?

Rasterizer · Aug 9, 2017

DavidGraham said:
RX Vega TimeSpy GPU scores:

1630MHz: 7209
1750MHz: 7580
GTX 1080 stock : ~7400

Driver version is new : 22.19.666.1

http://www.3dmark.com/compare/spy/2192756
http://www.3dmark.com/compare/spy/2192971

Well, seeing as an air cooled Vega FE scored 7,126 on June 28 using driver 22.19.384.2, my inclination is to suspect old drivers because the alternative would require believing that enabling AVFS and DSBR in RX Vega's drivers is worth exactly ~0%, and that nothing has been achieved in terms of correcting the memory bandwidth issues on Vega FE, but that would seem to be incompatible with the ETH hashrate rumours.

DavidGraham · Aug 9, 2017

Scott_Arm said:
1750MHz would be the water-cooled version, correct?

Yes Indeed.

Rasterizer said:
Well, seeing as an air cooled Vega FE scored 7,126 on June 28 using driver 22.19.384.2, my inclination is to suspect old drivers

As previously stated by AMD to PCPer and GamersNexus, Vega FE driver already has all the gaming optimizations until it's release. So RX driver could've really had almost nothing new to add to the table in that regard.

Rasterizer said:
the alternative would require believing that enabling AVFS and DSBR in RX Vega's drivers is worth exactly ~0%,

AMD already implied not to expect great differences due to the activation of DSBR.

Rasterizer said:
nothing has been achieved in terms of correcting the memory bandwidth issues on Vega FE

Who said memory bandwidth needed a driver to be corrected?

Deleted member 2197 · Aug 9, 2017

So basically no FP64 on the Instinct MI25 accelerators?

As is always the case for such accelerator-based architectures, the majority of the flops are supplied by the GPUs. In this case, each MI25 delivers 12.3 teraflops of single precision floating point (FP32) or 24.6 teraflops of half precision (FP16). Together they account for more than 95 percent of the system’s floating point computational power.

One of the main advantages of the Project 47 machine is that it is able to deliver a lot of floating point horsepower within a relatively small power envelope. AMD is claiming the system delivers 30 gigaflops per watt of FP32 operations, which would put it at or near the top of the Green500 list if somehow those FP32 operations could be transformed to FP64. Alas, these latest Radeon parts have little 64-bit capability, making the comparison somewhat irrelevant. The current Green500 champ is TSUBAME 3.0, which turned in a power efficiency of 14.1 gigaflops of performance based on (FP64) Linpack.

https://www.top500.org/news/amd-demos-petaflop-in-a-rack-supercomputer/

Cat Merc · Aug 9, 2017

pharma said:
So basically no FP64 on the Instinct MI25 accelerators?

https://www.top500.org/news/amd-demos-petaflop-in-a-rack-supercomputer/

We've known this for a while. Vega 10 is 1/16 FP64.

AlexV · Aug 9, 2017

This may be of interest, apologies if already posted: http://developer.amd.com/wordpress/media/2013/12/Vega_Shader_ISA_28July2017.pdf.

Deleted member 13524 · Aug 9, 2017

AlexV said:
This may be of interest, apologies if already posted: http://developer.amd.com/wordpress/media/2013/12/Vega_Shader_ISA_28July2017.pdf.

Curiously, I didn't find anything when searching for "crypto" or "mining". Weren't there mining-specific instructions?
I'm pretty sure there was a slide claiming that.

3dilettante · Aug 9, 2017

ToTTenTranz said:
Curiously, I didn't find anything when searching for "crypto" or "mining". Weren't there mining-specific instructions?
I'm pretty sure there was a slide claiming that.

I think the specifically cited instruction was the XAD_U32 instruction, which the document indicates is meant to accelerate SHA256 hashing.

AMD Vega 10, Vega 11, Vega 12 and Vega 20 Rumors and Discussion

Kaotik

Drunk Member

Deleted member 13524

Guest

Alexko

Anarchist4000

Alexko

Anarchist4000

Deleted member 13524

Guest

Arnold Beckenbauer

DavidGraham

Scott_Arm

Rootax

Cat Merc

w0lfram

Rasterizer

DavidGraham

Deleted member 2197

Guest

Cat Merc

AlexV

Heteroscedasticitate

Deleted member 13524

Guest

3dilettante