AMD Vega 10, Vega 11, Vega 12 and Vega 20 Rumors and Discussion

Anarchist4000 · Jan 5, 2017

seahawk said:
Or it is just a new fancy name for a chache feeding HBM memory.

More fun to call HBM cache as it's on the package and then a SSD your video memory. That way the marketing department can advertise 1TB VRAM compared to Nvidia's 12GB in the case of Titan.

Gipsel · Jan 5, 2017

CarstenS said:
Gipsel, why are you sure, HBC is not HBM? I did not see that from the leaked slides.

I'm not sure. I was wrong. Somehow misunderstood it at the first glance.

Gipsel · Jan 5, 2017

Anarchist4000 said:
The only way to get twice the FP32 throughput per clock is if the SIMDs got chained together and executed consecutive instructions in a single cycle. Simply doubling the number I wouldn't equate to the meaning of twice the throughput. It should mean each ALU doubled throughput if that were the case. Or quoting native FP64 performance which doesn't appear to be the case.

They specifically talked about the throughput of one NCU compared to one traditional CU with 64 SPs. And doubling the number of SPs is the simple way of doing it. How they would feed a higher number of SPs and how they are organized, no idea. Could be dual issue to two separate vector ALUs each clock. Or something more out of the ordinary like dual issue to the same vALU and computing over 8 cycles to match a round robin scheme over 8 vALUs (I don't think this will happen) or something else. There is also the possibility they can somehow fuse certain combinations of ops in the scheduler and issue the fused ops (meaning the higher throughput is only usable for relatively specific cases). This is still unknown right now.

seahawk · Jan 5, 2017

If you put more SPs into the CU, the Truck graphic makes even more sense.

Arnold Beckenbauer · Jan 5, 2017

revan said:
Waiting for experts comments:

http://videocardz.com/65406/exclusive-amd-vega-presentation

Isn't it leaker's job?

Gipsel · Jan 5, 2017

ToTTenTranz said:
New slides have been coming up:

http://cdn.videocardz.com/1/2017/01/AMD-VEGA-VIDEOCARDZ-37.jpg
11 polygons with 4 geometry engines?
Wut?

Up to 11 triangles get clipped/rejected per clock I guess. Up to now, the geometry throughput of AMD GPUs doesn't change that much depending on the visibility of the triangles.

Anarchist4000 · Jan 5, 2017

Gipsel said:
They specifically talked about the throughput of one NCU compared to one traditional CU with 64 SPs. And doubling the number of SPs is the simple way of doing it. How they would feed a higher number of SPs and how they are organized, no idea. Could be dual issue to two separate vector ALUs each clock. Or something more out of the ordinary like dual issue to the same vALU and computing over 8 cycles to match a round robin scheme over 8 vALUs (I don't think this will happen) or something else. There is also the possibility they can somehow fuse certain combinations of ops in the scheduler and issue the fused ops (meaning the higher throughput is only usable for relatively specific cases). This is still unknown right now.

Edited my response earlier, but it could be FMA4 style instructions with 4 operands. With all the packed math being performed that would make a lot of sense.

EDIT: It would also work well that that scalar per SIMD design I was theorizing. When not using the 4th operand, it could feed 16x4 scalar registers into L0 registers for a scalar. Translating the opcodes to do that shouldn't be difficult. Bulldozer had the FMA4 instructions, and I think GCN had the extra operands, but they were used to feed the single scalar or move data around.

xEx · Jan 5, 2017

the ve.ga site is down right now :runaway:

Arnold Beckenbauer · Jan 5, 2017

xEx said:
the ve.ga site is down right now

Not for me. ~ 50 minutes.

sebbbi · Jan 5, 2017

Gipsel said:
No more ROP caches (handled by L2 now).

About time! I have been fixing and improving console GCN2 cache management code in the past two weeks. I am happy to hear that L2 handles ROPs now as well (even if there's still some tiny L1 ROP caches). Much less L2 flushing needed. Should be good for async compute as well

seahawk · Jan 5, 2017

Yep, all those changes point to CUs with more SPs. 32CU @ 128SP - anybody?

Urian · Jan 5, 2017

I believe that the CU continue being the same, 128 Ops is from FMADD (2 ops per component).

Rootax · Jan 5, 2017

I'm more curious about rop & geometry(the strange 11 triangle instead of 4 on fiji seem nice) .Fiji was already a "compute" monster imo...

xEx · Jan 5, 2017

xEx said:
the ve.ga site is down right now

again

revan · Jan 5, 2017

https://www.computerbase.de/2017-01/amd-vega-preview/

Malo · Jan 5, 2017

Is there a live stream to the event? It just started.

xEx · Jan 5, 2017

Malo said:
Is there a live stream to the event? It just started.

http://videocardz.com/65470/watch-vega-architecture-preview-here

pTmdfx · Jan 5, 2017

It is quite interesting that they turn the local graphics memory into a cache. But it remains to be seen whether it is a page table and software magic (like Linux VMM) or a real hardware cache.

Malo · Jan 5, 2017

xEx said:
http://videocardz.com/65470/watch-vega-architecture-preview-here

ah ok

Looks like there is no VEGA live stream, but AMD indeed revealed new details about its new GPU.

WTF is the use of having a countdown on ve.ga? So they could load that web page (which was being hammered) at CES event instead of something local running?

SimBy · Jan 5, 2017

Is the size (520-530mm2) double confirmed?

AMD Vega 10, Vega 11, Vega 12 and Vega 20 Rumors and Discussion

Anarchist4000

Gipsel

Gipsel

seahawk

Arnold Beckenbauer

Gipsel

Anarchist4000

xEx

Arnold Beckenbauer

sebbbi

seahawk

Urian

Rootax

xEx

revan

Malo

Yak Mechanicum

xEx

pTmdfx

Malo

Yak Mechanicum

SimBy