Damien is at AMD now? While I'm happy for him, still, aww.
Don't underestimate how much power rasterizers and ROPs can suck up. If you're just doing compute, the power profile of a GPU is very different.I sort of doubt Vega can do 12 TFLOPS at 225W, considering even at 300W it's power throttling to 1450MHz ish.
Someone should try lowering the power target to 225W and see what happens.
Binning is basically a prerequisite for doing HSR in a tile (more efficient than just an early z test) before the pixel shader. This test shows that binning is not active there. If and how effective HSR will work with Vega is another question. But to answer that question binning has to be working first because otherwise the GPU can't do HSR with the binned primitives. Binning should be a net positive even without HSR most of the time. The HSR stuff comes on top of it by reducing not only the external bandwidth requirements (what binning does also without HSR) but also saving actual work for the shader array. That are basically two things, where HSR depends on a working binning rasterizer (but not the other way around).
=======================================
Well, I agree that it is very optimistic to hope for this. But assuming that the current state is really not using any of the architectural improvements over Fiji (besides the L2 backing of the ROPs) as the results very much are identical to what to expect from a heavily overclocked FuryX (~1.4Ghz with slightly more exotic cooling), it wouldn't be completely impossible. That assumption requires quite a bit of faith, though. But keep in mind that nV claims their tiled approach saves them ~60% (!) of the external memory bandwidth (in combination with DCC, which Fiji was barely taking advantage of). Add in some saved work through the means of HSR within bins when applicable, the allegedly overhauled work distribution in the frontend, some slight IPC improvements of the CUs, and it may start to appear feasible. The alternative would be that AMD has severely messed up the Vega design.
In the simplest case there will be just a new driver version for download in a few weeks or something. Imagine the driver stuff for all the enhanced GCN5 features were just not ready yet for prime time. They still had some issue with some visual glitches or crashes, so they simply need some more polish.Yes the thing is: does AMD really launch a *professional* card aim for professionals and scientific with older software when final version was almost almost finish? I mean that kind of software support its not something that you can finish over a month, even less actually cuz you have to chip the cards with that software so you need to finish it before starting the production.
In a hypothetic world I could trust AMDs comments of "RX Vega will be much different" but in the real world I really fail to see how AMD could turn around a product so much in so little time.
And most amazingly when NV launched Maxwell the TBR just worked. NV did not even talk much about it and nobody gave a damn if the GPU used a fall back mode or not.
I see no good answer to the problem for AMD.
The best case is
a) we sold you a 1000$ card and decided to withhold much of its power fro a month from you, because our PR department said it makes sense
the other options is
b) our driver team sucks
c) our hardware sucks
or a combination of b and c
In the simplest case there will be just a new driver version for download in a few weeks or something. Imagine the driver stuff for all the enhanced GCN5 features were just not ready yet for prime time. They still had some issue with some visual glitches or crashes, so they simply need some more polish.
Yes, you can argue AMD shouldn't have launched the FE in the first place if that is the case. But they did and they may have preferred a stable operation over a faster but crashing driver.
will be interesting to see what it is. I want to get a 1080ti performance part for $500 sometime this fall and I'm hoping vega is it. Will be a nice jump from my 290
AMD re-affirms RX Vega + others at SIGGRAPH (though technically "announce" could mean they'll just tell it's coming in month x)
With FE marketed more as a development card, hardware capabilities should matter more than performance and to some degree stability. Stability being important, but applications are assumed to be buggy in development.Yes, you can argue AMD shouldn't have launched the FE in the first place if that is the case. But they did and they may have preferred a stable operation over a faster but crashing driver.
Depends on the part of the "professional" market being discussed. I'm not sure FEs were designed for 12h renders and production as much as debugging software. On the "Frontier" isn't the best place for mission critical stability.I'd assume for the professional market stable drivers are the most important aspect . Someone using these to make money will be pissed if half way through a 12 hour render or what have you the thing crashes and they loose all that work.
The binning rasterizer in AMD's patent is fine with batching and binning primitives with transparency.You can discard only primitives which don't have any influence on something. You can't do it with transparent stuff or when not only writing to framebuffer with z test. That means in case of the triangle bin test (writing to an UAV independent of the outcome of the z test) no HSR can be performed. It will look exclusively on the binning behaviour.
There is no reason, why an UAV access should disable binning (reducing batch size to 1 means disabling binning).
- Additionally, a batch may be generated using a set of received primitives based on a determination that a subsequently received primitive (not illustrated) is dependent upon the processing of at least one of first primitive 406, second primitive 408 and third primitive 410. Such a dependency may affect the final shading of the subsequently received primitive. In such a scenario, the primitive batch would not include the subsequently received primitive. Rather, a new primitive batch is generated for the subsequently receive primitive, which can be processed once the current batch is processed and the dependency is resolved. The new primitive batch is processed at a later time.
It's not clear where it has been inserted. At a minimum, it should be part of the interface touching the memory controllers. That may give another reason why the ROPs no longer touch the controllers directly. Their caches don't speak a superset of hypertransport, and there may have been less work done to their internals by putting them behind another cache.How is the infinity fabric integrated into Vega and what could it possibly mean as far as potential bottlenecks? Is it used to to integrate with new functions such as the cache controller for accessing additional working memory? Replaced crossbar for communication with memory controllers?
This is the first GPU with it so I'm curious as to it's use.
Yes the thing is: does AMD really launch a *professional* card aim for professionals and scientific with older software when final version was almost almost finish? I mean that kind of software support its not something that you can finish over a month, even less actually cuz you have to chip the cards with that software so you need to finish it before starting the production.
In a hypothetic world I could trust AMDs comments of "RX Vega will be much different" but in the real world I really fail to see how AMD could turn around a product so much in so little time.
Marc will recrute someone good, i've not doubt about that.
BTW, what happened to "225w of power" ?
If I look at the float number, at the time they targeted a little slower ship, but if the small bump = 75 more watts, damn just get back to the initial target frequency...
Either slide was fake or it turned out that the design couldn't meet the original performance goals so they had to crank up voltages and/or frequency to meet stay competitive.
Again pardon my ignorance but are there any compute only benchmarks? (Something like Luxmark maybe) Have they been run Vega yet? I'm wondering if the "NCU's" give better performance beyond clock speed based improvements.
With a score of 4690, the Radeon Vega Frontier Edition performs 41% faster than the Quadro P5000 (GTX 1080 equivalent) and than the Radeon Pro Duo running on a single GPU (essentially a Fury X).
Nothing, they've referred to GPUs as SoCs before too.Raja called vega a SoC. what does this mean for a GPU?