AMD Vega 10, Vega 11, Vega 12 and Vega 20 Rumors and Discussion

I sort of doubt Vega can do 12 TFLOPS at 225W, considering even at 300W it's power throttling to 1450MHz ish.

Someone should try lowering the power target to 225W and see what happens.
 
I guess if I said just "yes" that would count as trolling, right? I am in good faith that the influx of able people in a variety of roles into the RTG will have an effect. Scott, Rys and now Damien, to name the last few I am aware of.
Damien is at AMD now? While I'm happy for him, still, aww.:(

I sort of doubt Vega can do 12 TFLOPS at 225W, considering even at 300W it's power throttling to 1450MHz ish.

Someone should try lowering the power target to 225W and see what happens.
Don't underestimate how much power rasterizers and ROPs can suck up. If you're just doing compute, the power profile of a GPU is very different.
 
Binning is basically a prerequisite for doing HSR in a tile (more efficient than just an early z test) before the pixel shader. This test shows that binning is not active there. If and how effective HSR will work with Vega is another question. But to answer that question binning has to be working first because otherwise the GPU can't do HSR with the binned primitives. Binning should be a net positive even without HSR most of the time. The HSR stuff comes on top of it by reducing not only the external bandwidth requirements (what binning does also without HSR) but also saving actual work for the shader array. That are basically two things, where HSR depends on a working binning rasterizer (but not the other way around).

=======================================

Well, I agree that it is very optimistic to hope for this. But assuming that the current state is really not using any of the architectural improvements over Fiji (besides the L2 backing of the ROPs) as the results very much are identical to what to expect from a heavily overclocked FuryX (~1.4Ghz with slightly more exotic cooling), it wouldn't be completely impossible. That assumption requires quite a bit of faith, though. But keep in mind that nV claims their tiled approach saves them ~60% (!) of the external memory bandwidth (in combination with DCC, which Fiji was barely taking advantage of). Add in some saved work through the means of HSR within bins when applicable, the allegedly overhauled work distribution in the frontend, some slight IPC improvements of the CUs, and it may start to appear feasible. The alternative would be that AMD has severely messed up the Vega design.

Yes the thing is: does AMD really launch a *professional* card aim for professionals and scientific with older software when final version was almost almost finish? I mean that kind of software support its not something that you can finish over a month, even less actually cuz you have to chip the cards with that software so you need to finish it before starting the production.

In a hypothetic world I could trust AMDs comments of "RX Vega will be much different" but in the real world I really fail to see how AMD could turn around a product so much in so little time.
 
And most amazingly when NV launched Maxwell the TBR just worked. NV did not even talk much about it and nobody gave a damn if the GPU used a fall back mode or not.

I see no good answer to the problem for AMD.

The best case is

a) we sold you a 1000$ card and decided to withhold much of its power fro a month from you, because our PR department said it makes sense

the other options is

b) our driver team sucks
c) our hardware sucks

or a combination of b and c
 
How is the infinity fabric integrated into Vega and what could it possibly mean as far as potential bottlenecks? Is it used to to integrate with new functions such as the cache controller for accessing additional working memory? Replaced crossbar for communication with memory controllers?

This is the first GPU with it so I'm curious as to it's use.
 
Yes the thing is: does AMD really launch a *professional* card aim for professionals and scientific with older software when final version was almost almost finish? I mean that kind of software support its not something that you can finish over a month, even less actually cuz you have to chip the cards with that software so you need to finish it before starting the production.

In a hypothetic world I could trust AMDs comments of "RX Vega will be much different" but in the real world I really fail to see how AMD could turn around a product so much in so little time.
In the simplest case there will be just a new driver version for download in a few weeks or something. Imagine the driver stuff for all the enhanced GCN5 features were just not ready yet for prime time. They still had some issue with some visual glitches or crashes, so they simply need some more polish.
Yes, you can argue AMD shouldn't have launched the FE in the first place if that is the case. But they did and they may have preferred a stable operation over a faster but crashing driver.
 
And most amazingly when NV launched Maxwell the TBR just worked. NV did not even talk much about it and nobody gave a damn if the GPU used a fall back mode or not.

I see no good answer to the problem for AMD.

The best case is

a) we sold you a 1000$ card and decided to withhold much of its power fro a month from you, because our PR department said it makes sense

the other options is

b) our driver team sucks
c) our hardware sucks

or a combination of b and c

How do you know maxwell TBR just worked ? It recieved performance upgrades consistantly for a year. All of AMD's gpu's have performed better as drivers improved and most of them ended up besting cards that were originaly faster.

So your example in A sounds silly because no one would pitch it that way. They would say by our $1k card today and look foward to constant performance improvements as software and drivers continue to take advantage of new features .


I will hold off for gaming vega before making my judgement on the card. I've had plenty of amd cards that have aged extremely well compared to nvidia cards
 
In the simplest case there will be just a new driver version for download in a few weeks or something. Imagine the driver stuff for all the enhanced GCN5 features were just not ready yet for prime time. They still had some issue with some visual glitches or crashes, so they simply need some more polish.
Yes, you can argue AMD shouldn't have launched the FE in the first place if that is the case. But they did and they may have preferred a stable operation over a faster but crashing driver.

I'd assume for the professional market stable drivers are the most important aspect . Someone using these to make money will be pissed if half way through a 12 hour render or what have you the thing crashes and they loose all that work.
 
I have my invitation, as it is holliday.. see you there .. !!!!!

If you are bored, come to the Blender sessions, theres a lot of news.. a lot..
 
Yes, you can argue AMD shouldn't have launched the FE in the first place if that is the case. But they did and they may have preferred a stable operation over a faster but crashing driver.
With FE marketed more as a development card, hardware capabilities should matter more than performance and to some degree stability. Stability being important, but applications are assumed to be buggy in development.

http://www.phoronix.com/scan.php?page=news_item&px=ROCm-1.6-Released

ROCm 1.6 landed yesterday, no idea/notes what it adds beyond Vega, so presumably there are deep learning devs more concerned with HBCC and packed math which should still work for developers.

I'd assume for the professional market stable drivers are the most important aspect . Someone using these to make money will be pissed if half way through a 12 hour render or what have you the thing crashes and they loose all that work.
Depends on the part of the "professional" market being discussed. I'm not sure FEs were designed for 12h renders and production as much as debugging software. On the "Frontier" isn't the best place for mission critical stability.
 
You can discard only primitives which don't have any influence on something. You can't do it with transparent stuff or when not only writing to framebuffer with z test. That means in case of the triangle bin test (writing to an UAV independent of the outcome of the z test) no HSR can be performed. It will look exclusively on the binning behaviour.
There is no reason, why an UAV access should disable binning (reducing batch size to 1 means disabling binning).
The binning rasterizer in AMD's patent is fine with batching and binning primitives with transparency.
The triangle bin test's having every pixel read and write a common value seems like it would be a dependence of some sort, and that would potentially meet a close batch condition.
The triangle test may place a secondary emphasis on HSR since it is counting on the opacity of the triangles to help demonstrate the behavior, and it has a very stark difference in behavior if AMD's batch and bin process were take to the most extreme outcome.

However, I may have conflated that consideration which is specific to the test with the more general behavior of the batching step. Even if the triangles were transparent, the question remains how the batching would proceed, and the sequence of binning from the batch and pulling in the next one.
The patent is written to indicate that the rasterizer iteratively processes through a batch across all bins, but whether that serializes the whole process is not clear. A non-contrived scene that didn't have perfectly overlapping triangles might include some other kind of distribution or give the tiling hardware a less rigid sequence of bins to work with.

edit: It slipped my mind earlier, but I was able track this down again:
http://www.google.com/patents/US20160371873
This is a continuation of the prior patent, and has more details on how the batching and bin intercept processes work.
There are more details as to why a batch can be closed as well.
  • Additionally, a batch may be generated using a set of received primitives based on a determination that a subsequently received primitive (not illustrated) is dependent upon the processing of at least one of first primitive 406, second primitive 408 and third primitive 410. Such a dependency may affect the final shading of the subsequently received primitive. In such a scenario, the primitive batch would not include the subsequently received primitive. Rather, a new primitive batch is generated for the subsequently receive primitive, which can be processed once the current batch is processed and the dependency is resolved. The new primitive batch is processed at a later time.

How is the infinity fabric integrated into Vega and what could it possibly mean as far as potential bottlenecks? Is it used to to integrate with new functions such as the cache controller for accessing additional working memory? Replaced crossbar for communication with memory controllers?

This is the first GPU with it so I'm curious as to it's use.
It's not clear where it has been inserted. At a minimum, it should be part of the interface touching the memory controllers. That may give another reason why the ROPs no longer touch the controllers directly. Their caches don't speak a superset of hypertransport, and there may have been less work done to their internals by putting them behind another cache.

How it connects the to the L2 or possibly the CUs is unclear. The numbers given for Vega's fabric bandwidth would make it worse for the the CU-L2 interface, and I sense that a hypertransport packet won't be as lightweight as the cache management now.

There's a non-data portion of the fabric related to control that might be plugged into various blocks. That's probably there, and it would be used to among other things carry data for DVFS to the hardware that manages it. What granularity that plugs into the hardware may be interesting. The more integrated the fabric is, the more area for non-execution resources, and the more individual blocks might need design changes.
 
Last edited:
Raja called vega a SoC. what does this mean for a GPU?

Yes the thing is: does AMD really launch a *professional* card aim for professionals and scientific with older software when final version was almost almost finish? I mean that kind of software support its not something that you can finish over a month, even less actually cuz you have to chip the cards with that software so you need to finish it before starting the production.

In a hypothetic world I could trust AMDs comments of "RX Vega will be much different" but in the real world I really fail to see how AMD could turn around a product so much in so little time.

compute should be simpler for AMD to deal with that graphics. I don't think their "professional" driver can be called older software or not ready. launch what you have working now and make money off it. its not a gaming card anyway so who cares. Rx Vega would likely have launched with Vega FE if they were ready with the gaming side of things.
 
Marc will recrute someone good, i've not doubt about that.

BTW, what happened to "225w of power" ?

AMD-VEGA-10-specifications.jpg


If I look at the float number, at the time they targeted a little slower ship, but if the small bump = 75 more watts, damn just get back to the initial target frequency...



Either slide was fake or it turned out that the design couldn't meet the original performance goals so they had to crank up voltages and/or frequency to meet stay competitive.
 
Either slide was fake or it turned out that the design couldn't meet the original performance goals so they had to crank up voltages and/or frequency to meet stay competitive.

PCper reported the card was dropping to lower power tiers without lowering clock speed. could be the power management is not working well currently. lvls were 280W, 240W and 200W without changing frequency (as far as they could tell). and it happened with temperature limits which is odd.
 
Again pardon my ignorance but are there any compute only benchmarks? (Something like Luxmark maybe) Have they been run Vega yet? I'm wondering if the "NCU's" give better performance beyond clock speed based improvements.
 
Again pardon my ignorance but are there any compute only benchmarks? (Something like Luxmark maybe) Have they been run Vega yet? I'm wondering if the "NCU's" give better performance beyond clock speed based improvements.

PCPer ran Luxmark. https://www.pcper.com/reviews/Graph...B-Air-Cooled-Review/Professional-Testing-SPEC
https://www.pcper.com/reviews/Graph...B-Air-Cooled-Review/Professional-Testing-SPEC
With a score of 4690, the Radeon Vega Frontier Edition performs 41% faster than the Quadro P5000 (GTX 1080 equivalent) and than the Radeon Pro Duo running on a single GPU (essentially a Fury X).
 
Back
Top