AMD Vega 10, Vega 11, Vega 12 and Vega 20 Rumors and Discussion

AMD's Mike Mantor states ALL Vega SKUs will have HBCC enabled. But when asked if Notebook or APUs will have it, not necessarily. He says other memory types could be used in situations HBM doesn't make sense.

Then what is HBCC then? Why call it "High Bandwidth Cache Controller"? Does it just refer to the fact that it could use system memory and even fast storage as virtual memory? And that it handles it better?
In some cases there wouldn't be an actual cache for paging, so it would be pointless. Simpler to just provide a reference to the original memory location. The alternative would be a memcpy to a private pool. Which would waste bandwidth and capacity that is likely in short supply.
 
Then what is HBCC then? Why call it "High Bandwidth Cache Controller"? Does it just refer to the fact that it could use system memory and even fast storage as virtual memory? And that it handles it better?
HBCC's use case across Vega's range is hardware management of multiple memory spaces, each with different properties like latency, bandwidth, read/write granularity, and other properties. Ideally, it can almost fully manage or augment software in moving data between pools in a way that maximizes the bandwidth that can be achieved--while sometimes being careful about items like NAND flash that have large blocks and limited endurance.

Part of this is the very large address space, which it seems like HBCC may be able to scale upwards to manage. That might mean memory that is on a remote node or across a network.

For consumer discrete Vega needs, a lot of this capability has no relevance. The controller is managing movement through the lesser bandwidth of the PCIe bus and higher latency to system RAM.
A consumer APU would just have system RAM, so there's no separate pool with different properties.
 
Then what is HBCC then? Why call it "High Bandwidth Cache Controller"? Does it just refer to the fact that it could use system memory and even fast storage as virtual memory? And that it handles it better?

Not sure if it's been mentioned, but if you enabled 'HBCC Memory Segment' in the current drivers, the driver is reloaded and the card's "physical memory" as reported by Windows increases based on whatever you set it to. Wish I had the slightest clue about what it actually does :(
 
Not sure if it's been mentioned, but if you enabled 'HBCC Memory Segment' in the current drivers, the driver is reloaded and the card's "physical memory" as reported by Windows increases based on whatever you set it to. Wish I had the slightest clue about what it actually does :(
It would be the amount of video memory reported to any applications. HBCC paging data to VRAM from there. So if you set it to 64GB, applications think you have a card with 64GB VRAM and allocate accordingly. For best performance, you likely want higher than actual VRAM, above upper limit an app would preload, and not run out of system memory in the process. Expect quickly diminishing returns though.
 
I merely take it as for a given memory setup the HBCC provides bandwidth amplification not latency reduction. They can always become more aggressive in prefetch and prediction later on if needed.
 
HBCC's use case across Vega's range is hardware management of multiple memory spaces, each with different properties like latency, bandwidth, read/write granularity, and other properties. Ideally, it can almost fully manage or augment software in moving data between pools in a way that maximizes the bandwidth that can be achieved--while sometimes being careful about items like NAND flash that have large blocks and limited endurance.

Part of this is the very large address space, which it seems like HBCC may be able to scale upwards to manage. That might mean memory that is on a remote node or across a network.

For consumer discrete Vega needs, a lot of this capability has no relevance. The controller is managing movement through the lesser bandwidth of the PCIe bus and higher latency to system RAM.
A consumer APU would just have system RAM, so there's no separate pool with different properties.

Very interesting. Thanks for that.
 
BTW, I see in this slide, 14nm and 14nm+ :

...


Does that mean we can see at some point a "refreshed" Vega with 14nm+ ? Is this even a thing at GlobalFoundries ?

...

Not according to anandtech's latest article on GF's roadmap. Though it is indeed strange to see GF offering the exact same flagship process for 2.5 years.
My guess is there's definitely some on-going process optimization happening between 2016 and 2018, and AMD probably knows better than anyone else the point at which they can call it "14nm+" and start making new chips that specifically take advantage of it.

https://videocardz.com/71232/amd-ryzen-5-2500u-with-radeon-vega-graphics-spotted

AMD Ryzen 5 2500U – Quad-core APU with Vega
If the first digit stands for "generation", "2" could mean second generation. And there is a rumor (!), that Raven Ridge APUs will be based on Pinnacle Ridge (Zen+).
 
If the first digit stands for "generation", "2" could mean second generation. And there is a rumor (!), that Raven Ridge APUs will be based on Pinnacle Ridge (Zen+).

There's also the fact that the APUs Great Horned Owl (4-core/8-thread + 11 Vega NCUs) and Banded Kestrel (2-core/8-thread + 3 Vega NCUs) were being described as having a "Next-generation Zen Core", back in February 2016.
In the same set of slides, the Threadripper (Snowy Owl) and Epyc (Naples) were shown has having just Zen cores, without the "next-generation" prefix.

Perhaps this is the origin of that rumor, though.
 
The cryptocurrency mining market is probably too volatile for this idea to make much sense, but I wonder whether a Vega 10 with four stacks of HBM2 and ~1TB/s of bandwidth might appeal to miners enough to make the development costs worth it. I suppose a few other markets might like it too.
 
The cryptocurrency mining market is probably too volatile for this idea to make much sense, but I wonder whether a Vega 10 with four stacks of HBM2 and ~1TB/s of bandwidth might appeal to miners enough to make the development costs worth it. I suppose a few other markets might like it too.

Vega 20 is Vega 10 + four stacks HBM2 + 1/2 FP64 rate:

d5WZzPi.jpg


There's a Vega 10X2 in there, but those may not be any better than just purchasing two Vegas. I'm guessing it's a Pro Duo successor.
 
A consumer APU would just have system RAM, so there's no separate pool with different properties.
This is not true, at least for all the APUs I am aware of (up till Kaveri). Graphics aperture is still a thing, and has a different interleaving granularity from regular system memory (page-level vs 256B).
 
This is not true, at least for all the APUs I am aware of (up till Kaveri). Graphics aperture is still a thing, and has a different interleaving granularity from regular system memory (page-level vs 256B).
It's still the same DRAM over the same bus, however, barring an introduction of an APU with separate memory. Mapping accesses to that granularity is already handled with less complex hardware. The lack of a need may explain why AMD hasn't mentioned HBCC for its APU.

AMD could optimize away a fair amount of the HBCC's functions that would not be exercised, in favor of the CPU memory controllers that AMD claims to have improved for mixed client loads and the integrated chipset's system functions and generic storage management.

While not ruling it out, I saw HBCC's functionality brought in a form of page/residency/writeback management, but didn't see if it would reformat or transform the data within them. It may not be trusted enough to interact directly like that once it is in the domain of the host processor's virtual memory, and just creating a region for it to copy and reformat data into may not play well with systems that cheap out on memory.

I'm curious if the HBCC's positioning between the Infinity Fabric and HBM controller in Vega makes it equivalently positioned to the coherent slave device in Ryzen. That role could mean the Ryzen host gets its coherent slave, and until the GPU gets its own pool it will live with what Ryzen has--or an alternate coherent slave block is implemented. In the alternate architecture scenario, it would seem like it would still be the case that AMD wouldn't need most of the HBCC's feature set.
 
I wonder whether a few on-package GDDR5(X)/6 chips on an APU might make sense with HBCC. I mean, with just a couple of fast GDDR chips, you could get over 100GB/s, and presumably 2GB.
 
Back
Top