AMD Vega Hardware Reviews

ArkeoTP · Jul 5, 2017

PCPer's Vega FE Crossfire benchmarks are out.

https://www.pcper.com/reviews/Graphics-Cards/AMD-Radeon-Vega-Frontier-Edition-CrossFire-Testing

It doesn't look good albeit I'm not sure I should be surprised as a former Crossfire user.

AnomalousEntity · Jul 5, 2017

Anarchist4000 said:
A8R8G8B8.

So ALUs do math in INT8 cause my output format is 8-bit. Have you ever written a pixel shader?

Clukos · Jul 5, 2017

http://www.gamersnexus.net/guides/2977-vega-fe-vs-fury-x-at-same-clocks-ipc

3dilettante · Jul 5, 2017

CarstenS said:
Anyway: „We“ are including SIMD-Units into IPC already, (whether or not that being correct in the strict sense of the word) right?

SIMD units make up a portion of the pipeline that instructions are issued to, so they are part of it in the sense that there needs to be something that executes instructions in a pipeline.
A SIMD instruction is still an instruction, but the FLOP count per instruction is not directly related to IPC. If that were the case, Intel's Knights Landing core would be considered as having higher IPC than the desktop x86 cores.

The SI portion of SIMD is Single Instruction, and IPC would be more concerned with what happens in terms of the instruction stream than the MD portion, which can be scaled horizontally within a pipeline's execution stage without disrupting how the pipeline handles code flow, instruction issue, hazards, or stall conditions.
A major motivation for having SIMD at all is that it amortizes the expensive hardware concerned with IPC over more data.

I doubt that anyone ever counted only the schedulers and dispatcher per CU/SM as indicative of IPC.

I'm trying to find more examples of where AMD used the term IPC for GCN besides Vega. You can find any number of architectural descriptions for IPC for Zen and other CPU cores, although those are superscalar cores that actively work to extract utilization out of one instruction stream.

GCN has for generations defined a ceiling IPC of 1, with any gains found in multithreaded throughput or measures to avoid stalls that would drive instruction issue below 1. Most of the marketing has been about utilization of the hardware and some token measures for single-threaded performance. That Vega's marketing made such a pointed reference to IPC this time around has more implications in part because that's not what GCN has been about.

There are other measurements, such as throughput and utilization of peak that can capture the sort of performance GCN has targeted without invoking IPC and all that it brings up.

AlNom · Jul 5, 2017

DOOM (Vulkan, Ultra, 0xAA, Async)

I thought they needed to use one of the AA options to even enable async.

ArkeoTP · Jul 5, 2017

AlNets said:
I thought they needed to use one of the AA options to even enable async.

Unless it was changed in an update, async was only active when using either no AA or TSSAA. Any of the other AA options disable async.

bdmosky · Jul 5, 2017

Clukos said:
http://www.gamersnexus.net/guides/2977-vega-fe-vs-fury-x-at-same-clocks-ipc

I don't understand them not trying to equalize bandwidth--even if it required downclocking the Fury X

silent_guy · Jul 5, 2017

bdmosky said:
I don't understand them not trying to equalize bandwidth--even if it required downclocking the Fury X

Even if they'd equalize clocks. There are probably differences in memory timings between HBM1 and HBM2 that are bigger than the current 5% difference in clock speeds.

The current settings are sufficient for the purpose of this benchmark.

AlNom · Jul 5, 2017

ArkeoTP said:
Unless it was changed in an update, async was only active when using either no AA or TSSAA. Any of the other AA options disable async.

Thanks. Curious they wouldn't want to enable TSSAA anyway.

Clukos · Jul 5, 2017

AlNets said:
Thanks. Curious they wouldn't want to enable TSSAA anyway.

A fair amount of PC gamers don't really grasp what temporal AA is and equate it to "that's a console thing where's my MSAA!!!"

I just cringe when I see people running Sli/CF setups (or high-end single gpu) and turn off temporal AA at 4k because "you don't need AA at that resolution anyway".

Cyan · Jul 5, 2017

Love_In_Rio said:
https://videocardz.com/70777/amd-radeon-rx-vega-3dmark11-performance

That´s the result of one of the multiple overclocked results over 1630 Mhz of the same card. The non overclocked one is still slower than a stock 1080. So, still similar results to Vega FE.

we shall see how everything unfolds. For now as long as Vega gives me 4k60 in all my games I'd be happy to get it. That's AMD minimum goal with Vega, 4k60 after all. I've saved 400€ for now to get a 4k capable gpu in the future, but I am not in a hurry.

ArkeoTP · Jul 5, 2017

Clukos said:
I just cringe when I see people running Sli/CF setups (or high-end single gpu) and turn off temporal AA at 4k because "you don't need AA at that resolution anyway".

But you may want to turn off TAA more often than not with a mGPU setup as temporal techniques aren't AFR friendly and can cause problems with scaling and frame pacing.

But yeah, you always need more AA*

Until you reach the limits of irresponsibility with something like 32xS HSAA and 8x SGSSAA combined but at that point, your 9 year old game is running at 10 fps on a 1080 Ti with 10+ gigs of memory usage and you're probably doing it 'cause you're bored

Increasing spatial resolution helps everything somewhat but it's not the solution to end all solutions. I have a friend who dislikes playing BF4 because the temporal aliasing in that game is really bothersome even at 4K with 200% res scaling. Easy to say that he was delighted by the addition of TAA to BF1.

Clukos said:
A fair amount of PC gamers don't really grasp what temporal AA is and equate it to "that's a console thing where's my MSAA!!!"

I definitely know people like this who bash on modern analytical and temporal methods because they do the unholy PC sin of introducing blur to the image. Common signs of these including championing SMAA 1x and not realising a screenshot has blurry AA until someone points out it has blurry AA.

Wait, we're going way too offtopic, are we?

Perhaps I should create a separate thread for pure appreciation of anti-aliasing and what it has done for us.

BacBeyond · Jul 6, 2017

Love_In_Rio said:
That´s the result of one of the multiple overclocked results over 1630 Mhz of the same card. The non overclocked one is still slower than a stock 1080. So, still similar results to Vega FE.

How can you tell what clocks it was running at?

BacBeyond · Jul 6, 2017

Clukos said:
http://www.gamersnexus.net/guides/2977-vega-fe-vs-fury-x-at-same-clocks-ipc

Same performance in gaming (slight decrease probably from slightly lower memory bandwidth), but much better in productivity tasks. Overall very informative review though thanks for posting!

gamervivek · Jul 6, 2017

Love_In_Rio said:
https://videocardz.com/70777/amd-radeon-rx-vega-3dmark11-performance

That´s the result of one of the multiple overclocked results over 1630 Mhz of the same card. The non overclocked one is still slower than a stock 1080. So, still similar results to Vega FE.

Does he know the person(s) running these benchmarks because the top score has 1630Mhz only and the results have a 15% spread, too high for an overclock unless AMD have eked out another clockspeed bump or the other cards were running substantially below 1630Mhz in which case 1630Mhz Vega is better than the stock 1080.

The two different benchmarks I've seen were on different CPUs, so that might easily affect a 720p benchmark.

hkultala · Jul 6, 2017

3dilettante said:
GCN has for generations defined a ceiling IPC of 1, with any gains found in multithreaded throughput or measures to avoid stalls that would drive instruction issue below 1. Most of the marketing has been about utilization of the hardware and some token measures for single-threaded performance. That Vega's marketing made such a pointed reference to IPC this time around has more implications in part because that's not what GCN has been about.

IPC == Instructions Per Cycle. But per thread or per core(CU)?

Each GCN core(CU) can fetch multiple instructions per cycle (from different threads).
It can also issue multiple instructions at same clock cycle.

So, per core, the IPC can be >1.

Per thread it's limited to 1 due only fetching single instruction per thread.

CarstenS · Jul 6, 2017

BacBeyond said:
Same performance in gaming (slight decrease probably from slightly lower memory bandwidth), but much better in productivity tasks. Overall very informative review though thanks for posting!

Productivity meaning Spec ViewPerf 12.1? Or did I miss more of the „serious“ tests?

FWIW, on our testing bench, we're basically tying their Fury X results. On that same bench, a Fire Pro W9100 scores 78,79 in SNX-02, the test which result gamersnexus uses to assert vertex-superiority of Vega about Fiji.

sebbbi · Jul 6, 2017

hkultala said:
Per thread it's limited to 1 due only fetching single instruction per thread.

GCN instructions are 64 wide. Executed in 4 cycles using a 16 wide SIMD. Maximum IPC is 1/4 per lane. CU has four SIMDs. But these are independent (each execute different set of waves). If you want a CU to execute 64 instructions (= 128 flops) per clock, you need to have four waves running on the CU (one per SIMD). This is 10% of the SIMD occupancy (max 10 waves per SIMD to hide latency). Fortunately all common instructions have latency of 1, so single wave per SIMD is actually enough to fully utilize the SIMD... assuming of course that there's no memory operations (including groupshared memory). GCN doesn't need high occupancy to fill the pipelines, it needs high occupancy to hide memory latency.

3dilettante · Jul 6, 2017

hkultala said:
IPC == Instructions Per Cycle. But per thread or per core(CU)?

As Sebbi noted, it's 1/4 for vector utilization. There are specific cases where the instruction buffer can churn through at 1, but those skip the rest of the pipeline. I was thinking in terms of what it logically appears as to the software, but IPC is more of a statement about what the implementation is actually doing. I must need more caffeine if I'm lapsing on that concept.

Each GCN core(CU) can fetch multiple instructions per cycle (from different threads).
It can also issue multiple instructions at same clock cycle.

So, per core, the IPC can be >1.

Not in the way the term IPC has been specifically used. It's effectively 1/4 per stream of execution through the pipeline. There are other figures for instruction throughput, but watering down the definition of IPC generally only happens when marketing needs to hide something.

leoneazzurro · Jul 6, 2017

I think the term "IPC" is improperly used here. I think for most of the people the right word could be "efficiency", that is (effective calculation per cycle)/(maximum theoretical instruction per cycle)

AMD Vega Hardware Reviews

ArkeoTP

AnomalousEntity

Clukos

Bloodborne 2 when?

3dilettante

AlNom

Moderator

ArkeoTP

bdmosky

silent_guy

AlNom

Moderator

Clukos

Bloodborne 2 when?

Cyan

orange

ArkeoTP

BacBeyond

BacBeyond

gamervivek

hkultala

CarstenS

Moderator

sebbbi

3dilettante

leoneazzurro

Similar threads