AMD Vega Hardware Reviews

My take was that it had to do with how many waves were being launched. I haven't studied crypto in any detail so speculating there. That being the case, if Vega significantly increased VGPRs, accommodating far more active waves, it would be a concern.
Given that miners try to reduce the GPU's performance while maximizing memory performance may indicate that tweaking GPRs may not yield as much a difference as correcting whatever odd problem Fiji had with utilizing its HBM bandwidth--a problem not specific to mining. Perhaps the occupancy concerns of Ethereum can be elaborated on by those using it. I thought current hardware wasn't that burdened by that consideration for mining.

Not accounting for any changes to LDS mechanics or temporary registers (SIMD local LDS?) that may need some explicit coding to see gains with hashing algorithms.
There's a somewhat blurry die shot if you want to try finding evidence of things like a redistribution of LDS.

I don't know enough about the access patterns to really judge the impact, just that it's possible. If accesses were grouped or following a parabolic/clustered distribution in their access pattern over time as opposed to completely random.
My impression of GPU mining, and Ethereum as the main example, is that it does tend towards unpredictable access. It helps make device-local bandwidth important and penalizes or renders useless larger clustering strategies.

Where HBCC could page in active areas and maintain a more even distribution and throughput.
My point is that I think it, by design, does not demand capacity sufficient to need paging.
 
Yeah, so why Fiji is performing like a 390x, 1080ti with gddr5x not that good either... It's more complex than just pure BW.
While I cannot speak as much to the 1080ti, there were signs that Fury's memory hierarchy behaved in some memory bandwidth tests like it couldn't really exercise HBM's bandwidth advantage until after the on-die cache hierarchy was exhausted. Prior to that, Fury's L1 to L2 behavior looked similar to Hawaii (except slightly slower) and hadn't scaled with channel count as it used to.

If that doesn't apply to Ethereum mining, I tried some comparisons between Hawaii and Fiji back then, and one area where the HBM implementation hadn't scaled was the number of DRAM banks in all devices on the board.
DRAM is affected by banking conflicts and various latencies that can be worsened by scattered or random memory accesses. Hitting new banks frequently can cause faster or higher-bandwidth DRAM to bottleneck on those. Such latencies are more about the limitation of the DRAM arrays (vary much less than the interfaces) or the overall devices, and can cause a supposedly higher-bandwidth bus to not perform as well as expected based on channel or bandwidth numbers. In that regard, the behavior of the overall population of DRAM arrays might have been hit by some particular corner case behavior of the workload.

In theory, HBM2's pseudo-channel mode and higher stack counts could either reduce the cost of bank activation or at least increase the total number of banks available. Perhaps that might help extract more accesses before hitting a device limitation, although GDDR5X has something similar for reducing bank activation costs.
 
There's a somewhat blurry die shot if you want to try finding evidence of things like a redistribution of LDS.
I'm saying the capacity could have changed, scheduling frequency, etc. If LDS operations could be sustained each cycle for all SIMDs. The per SIMD concept I have a feeling is there, but capacities likely rather small at 1KB. So it would be rather difficult to spot.

METHOD AND APPARATUS FOR INTER-LANE THREAD MIGRATION

Here's a fun one. Surprised these are getting published if not Vega.

That assumes stock memory clocks. The usual core down, memory up clocking would have increased the bandwidth substantially. Around 1100MHz on FE, so 16% higher. That would leave Vega64 at ~75MH/s with linear scaling on bandwidth. No idea how all the cache could affect that. No idea how well the 4-Hi stacks overclock either.
 
That still holding out on the segmentation of each 64 kiB register file in two blocks?
https://www.techpowerup.com/reviews/AMD/Vega_Microarchitecture_Technical_Overview/images/arch-13.jpg
Boy, how about the 32-wide SIMDs presented optically compared to the text saying SIMD16?
Hard to tell what they meant by doing this, although one interpretation for both the ALUs and register files is that AMD is drawing blocks that can be 32 or 2x16 as two blocks. The SIMD-16 descriptor would remain consistent.

Actually having that many more registers would be a notable change that could help add up to the earlier slide stating there was over 45MB of SRAM. It might have almost the opposite problem where the storage in the rest of the GPU would add up to a fair amount more than that number, although the slide didn't give 45 as a ceiling.
 
That still holding out on the segmentation of each 64 kiB register file in two blocks?
https://www.techpowerup.com/reviews/AMD/Vega_Microarchitecture_Technical_Overview/images/arch-13.jpg
Boy, how about the 32-wide SIMDs presented optically compared to the text saying SIMD16?
Wow, hadn't even noticed that. Yeah I think I'll stick with the double banks for now. Although I was assuming two banks of 64KB or a net increase in capacity at the very least. The reasoning behind the double banks was that if alternating wave scheduling that should remove the RF from being a bottleneck.

Have to wait and see what they did, but 12TFLOPs of just MULs isn't impossible. Although that theory also had the overloaded scalar to leave the SIMDs with only core math functionality to keep size down. Assuming that slide isn't wrong or just packed math, the whitepaper on this is going to be interesting to say the least.

Hard to tell what they meant by doing this, although one interpretation for both the ALUs and register files is that AMD is drawing blocks that can be 32 or 2x16 as two blocks. The SIMD-16 descriptor would remain consistent.
Maybe, although they also label a single ALU as "Vector ALU 32-bit / 2x16-bit". Especially for the registers, why would "packed" registers ever be drawn as two? Doesn't even consider the 4xINT8. Considering that lane migration patent I linked above, scheduling two instructions for complementary waves could be a possibility. Goes back to some of those papers we examined a while ago.
 
I apologize if I missed that, but do the tech sites already have the cards and when are we expecting reviews to be published?
 
This is a piece of marketing material in a press deck, not a diagram in a technical paper... It is not too hard to imagine "Oh so one 32-bit register can act like two 16-bit register now? Let's draw it as two blocks then." in layman terms being the case.
Maybe, but marketing has been pushing packed math, not packed buffers that would have existed with Polaris. Splitting some of the hardware for double rate operation wouldn't be an unreasonable method to try to increase clockspeeds. Either way we can't confirm anything with the current information.
 
I apologize if I missed that, but do the tech sites already have the cards and when are we expecting reviews to be published?

On a similar theme, does anyone actually know when Vega will be available, and whether both the 64 and 56 variants will be released together as I've read conflicting reports on this. Also any dates for the AIBs custom cards?
I think a Asus Strix or MSI gaming X would be the ones to go for. That said, even if they aren't popular with miners I can't see me being able to pick up a Vega 56 for £400-£450 with the RX580s and GTX 1070s current prices.
 
With the eth difficulty going up up and up, I think the mining stuff will slow down a bit (even if the eth value is going up, difficulty is going up way faster). Now, zec difficulty is stable, but the value is not so good right now... Anyway, I believe price will be ok.
But I'm curious about reviews too. When should they appear ?
 
On a similar theme, does anyone actually know when Vega will be available, and whether both the 64 and 56 variants will be released together as I've read conflicting reports on this. Also any dates for the AIBs custom cards?
I think a Asus Strix or MSI gaming X would be the ones to go for. That said, even if they aren't popular with miners I can't see me being able to pick up a Vega 56 for £400-£450 with the RX580s and GTX 1070s current prices.
Reference models will be available 14th this month, so next monday. Asus Strix will be available sometime in September
 
Bearing in mind how badly built and over-priced all Asus stuff is these days, steer clear. Do your own research, don't just buy bling.
Hate to say it, but I totally agree with Jawed. Asus has given me a LOT of headaches the last few years, the quality seems to go down as the prices go up. :(
 
Hate to say it, but I totally agree with Jawed. Asus has given me a LOT of headaches the last few years, the quality seems to go down as the prices go up. :(
What do you recommend then?
I haven't updated my computer since I was working @ IMG some 5 or 6 years ago, what motherboard manufacturer should I trust?
(gigabyte maybe?)
 
My X99-A /2011-3 is perfect. No prob with it, I still trust Asus for motherboards. Beyond that, I have to say I don't follow their reputation for other products.
 
I've been impressed with evga's quality in video cards (ball bearing fans ftw) and PSUs. Haven't had one of their MBs though.
 
Back
Top