AMD Vega Hardware Reviews

HBCC is materially different in power consumption from a traditional memory controller? Or just "not for free" as in maybe 5%-ish difference?
Not free in area and poiwer. I have zero clue about how much but I would very much like to know.


so Vega is basically a 1080 with worse power efficiency... a year later. Even at a slightly better price why would someone choose this over a 1080 exactly?
1 - FreeSync. On a FreeSync monitor, a Vega running at 45-50FPS will behave practically as well as a 1080Ti would with a 60Hz cap. G-Sync monitor prices are off-the-charts and 144Hz (or anything above 60-75Hz, really) is for esports and little else IMO.
2 - When (or you could argue if) FP16 pixel shaders start being used in idTech6+Vulkan (Bethesda deal) and Frostbite and other games, a significant performance boost over older architectures is expected.
3 - FreeSync
4 - Price/performance (according to the same MSI employee who commented on RX Vega's power consumption).
5 - Did I mention Freesync?

Higher power consumption is a complete non-issue for people who have jobs and/or family and/or leave the house so they can't spend 8 hours/day playing videogames.
People who are so hopelessly fixated with power consumption IMO should get a console instead, there's a lot more visual-bang-for-the-watt there.


I get that up until Vega's release then getting a GTX1070 -> 1080 -> 1080 Ti was the only rational choice for anyone in search for a high-end card. Vega being released at a price matching its performance changes that.
 
Last edited by a moderator:
???

The perf/watt is still quite sad. The power consumption in itself isn't a huge issue (for gamers) but what comes with it is - heat & noise & limits. Not to mention most potential consumers at its performance tier will have already picked up a card... 7nm cards are probably less than a year away.... One would have to have a pretty skewed perspective to view it as a good value/investment. Professionals will certainly be interested in it, but beyond that I wouldn't expect any sales records.
 
7nm cards are probably less than a year away....

How is "less than a year away" a valid reason to withhold a graphics card purchase?
Why wait for 7nm when 7nm EUV is just 6 months after that?
Heck, let's just wait for graphene chips instead.
 
2*FP16 probably doesn't come for free in the power and area departments and neither does HBCC.
I doubt it costs very much. They might be able to just reuse the FP32 logic at twice the clock with half gated off. Then consider the logic gets exponentially larger with size so FP16 is tiny if not supporting the full instruction set. FP16 could also be accelerating programmable HDR so there's that aspect to consider.

HBCC probably saves power if allowing half the memory. Not to mention advantages of using virtual pages for everything. May make some driver work easier now, but open the door to more flexible programming in the future. It should be providing the hardware features for dynamic memory management as with CPUs. So shader controlled dynamic memory allocations.

No. Vegas problem is that it does not have the efficiency (Perf/Power) to compete with any Pascal chip. Vega FE is already hitting the power limit hard, thats why you see the AiO card. And Vega RX will have the same problem if the rumors about an AiO version is true. It is basically Fiji 2.0.
We don't know what the true efficiency is until the hardware is properly enabled. There are viable techniques that could significantly affect performance and power. Some issues seem more thermal than power.

At least they're not showing it against the Titan anymore, being more realistic? Apparently adaptive-sync and g-sync were also being used and the monitor was blocked so no way to determine the refresh range. Doesn't look like resolution was mentioned either. If so much has to be hidden and you're locking down to a likely 45-70ish fps, I'm not sure even doing these tours is even worth it.
I'd assume they booked the venues a while ago. AMD did say Titan was the performance target with FE.

I saw a commit yesterday about temporarily disabling all VGPR Indexing because of a compiler issue with llvm. That was directly affecting GS stream out among other things. If they were designing around some form of register paging that would likely be affected as well. Register pressure could cause the current issues and affect bandwidth. In addition to thermal problems and early power profiles.
 
Vega seems to be the best offering above 300$ at the moment.
300$ for a 500mm² chip with HBM2?

Forget it. Make ist 600$ and AMD is maybe making a profit. Everything below 500$ will basically ruin the margin for AMD.

We don't know what the true efficiency is until the hardware is properly enabled. There are viable techniques that could significantly affect performance and power. Some issues seem more thermal than power.
thermal = power

You cannot dissipate 400W from a graphics card. Even with AiO watercooling. NVIDIA has understood that since the development of Kepler. And since then every generation makes the gap between AMD and NVIDIA bigger and bigger, even though AMD is promising to catch on.
 
Now we know why NVIDIA slashed 1080 prices, and upgraded it's memory bandwidth to 11Gbps, they were in the loop. They did the same for the 1060 in anticipation for the RX 580.
 
I predict the water cooled version of Vega will need to be no more than 599$
The air cooled at 499$
And the cut down version at 399$
 
I doubt it costs very much. They might be able to just reuse the FP32 logic at twice the clock with half gated off. Then consider the logic gets exponentially larger with size so FP16 is tiny if not supporting the full instruction set. FP16 could also be accelerating programmable HDR so there's that aspect to consider.
Then why is 2*FP16-per-ALU absent from every single consumer Pascal graphics chip, despite nvidia implementing in TX1 and GP100?


There's obviously a caveat to upgrading FP32 ALUs to support dual FP16 ops in parallel. Unless you think nvidia took it off Pascal GPUs just to put pressure on their clients to upgrade their Pascal GPUs when FP16 ops become widespread in AAA PC games.
 
I did some extremely approximate pixel counting from a Polaris 10 die shot and that marketing depiction of Vega.
I'm not entirely sure which blocks are which for the Vega item, and some uncertainties on my part exist for certain parts of Polaris like the RBEs.

I think in the ballpark of 65% of Polaris would be units relevant to the synthetics run (CU arrays, front end, geometry, ROPs, L2).
Vega, if guessing that the lower edge is memory PHY and controllers and the filled-in rectangles correspond to similar hardware in Polaris, takes that to 85%.
This seems to make intuitive sense since the portion of area for various supporting units or media blocks doesn't need to double over Polaris, and the GDDR5 area is a major presence around the chip versus the one edge of HBM2 for Vega.

Roughly, I think the CU arrays, strip of geometry and command processors in the center, L2 area, and RBEs(maybe???) have ratios as follows:

Polaris: .45?, .09?, .05??, .06???
Vega: .64?, .09?, .05??, .08???
Each ? indicates how wildly off I think I could be.

I think what is least ambiguous is the area taken up by the CU arrays relative to Polaris, assuming 232mm2 Polaris and 484mm2 Vega.

For Vega, even though the front end hardware is the same fraction, that's going on twice the area.
Non-culled throughput seems to track all right with that, despite probably including non-geometry elements like the command processors in that area. It's short of ideal scaling, but this has proven thorny to really hit ideal scaling.
Culling throughput, however, seems to be where the area devoted versus the 580 is currently not paying off.

The uncertainty around the area for ROPs is too much be clear, although the synthetic seems to track with twice the ROPs.

L2 estimate might be iffy for Polaris, but it might mean Vega roughly doubles capacity if the ratio is maintained.

The CU area is significant, and at least right now synthetics and performance doesn't seem to mesh with the area devoted. It's something in the area of 3x the area for the arrays, so the CUs seem to be larger or have some other associated array space cost.
 
Hmm, so I found that the 2nd BIOS switch position on FE Air allows the fan to run to max speed. It would be nice if AMD had mentioned that somewhere. I only guessed it since I had a 290, which did the same thing.

On that note, it would have made a lot more sense to have the 1st position limit clocks and voltage instead of trying and failing to run at faster speeds (default switch position limits fan to 2000rpm). I think most people running in quiet mode would prefer maintaining a consistent level of reduced performance and power consumption instead of the card going clock crazy.
 
I doubt it costs very much. They might be able to just reuse the FP32 logic at twice the clock with half gated off. Then consider the logic gets exponentially larger with size so FP16 is tiny if not supporting the full instruction set. FP16 could also be accelerating programmable HDR so there's that aspect to consider.
AMD themselves call it „packed math“, don't they? And also in case of Nvidia it's 2x throughput, but 1x instructions.

Hmm, so I found that the 2nd BIOS switch position on FE Air allows the fan to run to max speed. It would be nice if AMD had mentioned that somewhere. I only guessed it since I had a 290, which did the same thing.

On that note, it would have made a lot more sense to have the 1st position limit clocks and voltage instead of trying and failing to run at faster speeds (default switch position limits fan to 2000rpm). I think most people running in quiet mode would prefer maintaining a consistent level of reduced performance and power consumption instead of the card going clock crazy.
You can change that fan behaviour in gaming mode's wattman without ever touching the BIOS switch.

--
On another note: with good air cooling (Morpheus II), thus pretty much constand 1600 MHz, and mem-OC to 1100, ethminer 0.11 ran at 37 MH/s in the afternoon heat.
 
You cannot dissipate 400W from a graphics card. Even with AiO watercooling. NVIDIA has understood that since the development of Kepler. And since then every generation makes the gap between AMD and NVIDIA bigger and bigger, even though AMD is promising to catch on.
Hasn't been a problem in the past. I think most large, overclocked cards are in that range. Dissipating heat is easy as cooling efficiency is a function of temperature over ambient. Staying within a thermal limit is another matter. Same reason CPUs at 40C had larger coolers than GPUs at 90C. The concern with Vega would be the stacked HBM insulating itself.

I wouldn't wish this "just" task on any engineer. :)

There is nothing easy about it.
The logic behind it wouldn't be overly difficult. Technically you could just increase clocks until half the logic had an indeterminate state from propagation delay that you ultimately ignore. At least for INT. Then a simple crossbar that swizzles the first and second half of the input and output.

The "hard" part would be timing as it would likely constrain your clockspeed. Only so much you can do to make gates switch faster.

Then why is 2*FP16-per-ALU absent from every single consumer Pascal graphics chip, despite nvidia implementing in TX1 and GP100?
Product differentiation so they can charge more. It will likely reduce clocks a few percent as well, so there is some downside. Even 4x FP16 I would think is possible, but feeding that beast would take some work.
 
Back
Top