Async compute is a very good idea, but ironically has more potential on GPU architectures that aren't very good at keeping all their cores busy with something to do (workload permitting..).Do you think leaving ALUs idle is a good thing? Do you think the console devs using this in their games is an illusion?
Do you think leaving ALUs idle is a good thing? Do you think the console devs using this in their games is an illusion?
I believe the LinkedIn leak said 232mm2. The guy knew what he was working on.Hmm, ~234mm² for Polaris 10, according to VideoCardz. That's somewhere between Pitcairn and RV770.
Ideally, your ALUs are kept busy without explicit hints. If you cannot do that, you would need Async. The less you can do that, the greater the profit from AC.Do you think leaving ALUs idle is a good thing? Do you think the console devs using this in their games is an illusion?
Isn't that what they just did? Leave compute tasks capable of running asynchronously in a separate queue so they can be scheduled optimally by the driver. A strategy that seems to require very little time by developers. There isn't even a requirement that it needs to execute asynchronously.Leaving them idle is definitely not a good ting. Even better would be minimizing the amount of idle time to begin with. Async is a useful scalpel to shape workloads to match a given architecture's capabilities.
The numbers coming out now aren't really demonstrating any tangible benefit hence my question. Is there meat to this or just hype.
Ideally, but it's a lot easier to scale ALUs than TMUs and ROPs and crafting shaders to use the extra ALUs all the time is another problem. Being able to balance bottlenecks across kernels seems a beneficial solution. Why disable/idle ALUs at the beginning of the frame just to be restricted by ALU throughput later?Ideally, your ALUs are kept busy without explicit hints. If you cannot do that, you would need Async. The less you can do that, the greater the profit from AC.
Isn't that what they just did? Leave compute tasks capable of running asynchronously in a separate queue so they can be scheduled optimally by the driver.
A strategy that seems to require very little time by developers. There isn't even a requirement that it needs to execute asynchronously.
AMD is pushing async real hard. Do they genuinely believe it's such a big deal or is it just marketing?
http://www.amd.com/en-us/innovations/software-technologies/radeon-polaris#
Async compute is quite nice from an efficiency standpoint. If you're clever and running the right sort of program (EG a game) you can get 10-20% more performance out a GPU. Though half the reason they're PRing it so much is because NVIDIA doesn't have proper support.
Sort of, yeah. But what I meant was a hardware solution that works automatically. So every application profits from it that uses much GPU power.Isn't that what they just did? Leave compute tasks capable of running asynchronously in a separate queue so they can be scheduled optimally by the driver. A strategy that seems to require very little time by developers. There isn't even a requirement that it needs to execute asynchronously.
Yeah, why would you disable ALUs? No point in it, but you have to work extra hard to keep them busy at all times. All times, as in not only when special programs do trigger certain hints.Ideally, but it's a lot easier to scale ALUs than TMUs and ROPs and crafting shaders to use the extra ALUs all the time is another problem. Being able to balance bottlenecks across kernels seems a beneficial solution. Why disable/idle ALUs at the beginning of the frame just to be restricted by ALU throughput later?
So AMD has claimed that the division in perf/W increase is 30% for design and 70% for process.Which either way is devastatingly small and low power if even the 2x performance per watt claim shows up.
AMD kicked off their paper launch of Polaris in December for reasons unknown, so I don't see the big deal.It's no wonder Nvidia seems eager to paper launch their products, with GP100 "launched" without even mention of availability and GP104 paper launching sometime this month, even though availability isn't supposed to happen until computex. Well, if it all adds up then good for AMD.
The developer wouldn't have control beyond the driver level beyond a convoluted use of barriers. Which part actually does the scheduling likely depends on architecture, but beyond the driver it's out of the developers hands anyways.I'm not quite sure how much say the driver has over scheduling of the various queues.
Careful, but that's always the case as you could stall the pipeline. Oxide for example said they implemented it in a weekend just to see what it could do. That doesn't sound difficult to me. I'm not sure we've heard from console devs beyond significant performance gains. I think the issue is some devs likely overthinking the implementation. You just want to ensure the compute queues are filled while executing graphics and not selecting 2 compute tasks and adding barriers to try and force them to execute concurrently. Even with the compute queue filled there is no reason to expect tasks not to execute serially. It shouldn't be that different from hyperthreading where you have a 2nd thread to schedule on available hardware.Everything I've read so far says quite the opposite. Devs need to be careful and deliberate when using async. Improper usage can hose performance (fighting for the same resources, cache thrashing etc)
At least on newer GCN versions I think it does work automatically. It has to do some sort of tuning, unless it's using a round robin dispatch of all the queues, or you'd just flood the hardware with a single available kernel. I really haven't seen any clarification from AMD on just how they select wavefronts for scheduling. If you follow the thought process that they wanted concurrent execution, having the scheduler target ratios of graphics:compute or fetch:alu doesn't seem unreasonable. I'd imagine it's not available because Nvidia is still working out the details for their implementation.Sort of, yeah. But what I meant was a hardware solution that works automatically. So every application profits from it that uses much GPU power.
Power efficiency going off that recent patent. Throughput would reduce to whatever was required by disabling or downclocking ALUs. You would basically guarantee the hardware was always close to full utilization.Yeah, why would you disable ALUs? No point in it, but you have to work extra hard to keep them busy at all times. All times, as in not only when special programs do trigger certain hints.
So AMD has claimed that the division in perf/W increase is 30% for design and 70% for process.
Based on the Fury X vs Titan X numbers on hardware.fr, the perf/W ratio between the 2 is almost exactly 1.30 (37.3 / 28.7). And that's despite Fury X benefitting from HBM power efficiency, so the numbers are a bit flattering for AMD in terms of architecture efficiency.
Nvidia is largely keeping the architecture of Maxwell, so let's apply 70% process benefit, for a perf/W ratio of 63.4. For AMD to get even with Nvidia, it needs to improve perf/W by 63.4/28.7 = 2.21. But taking into account the lack of HBM, it's probably a bit more.
I think that's very reasonable and achievable. I also think that's about the best we can expect from AMD. I don't see at all how that would be devastating. It's just AMD catching up to where Nvidia was 2 years ago in terms of architecture alone.
AMD kicked off their paper launch of Polaris in December for reasons unknown, so I don't see the big deal.
Eh, highly depends on what you're benchmarking. If you're looking at newer, DX12 stuff like Hitman in DX12 mode/Ashes of the Singularity AMD already has a performance, and even performance per watt advantage over Nvidia, but who knows if that's just those games and will continue to translate to other newer titles? Either way, a 150 watt 980ti like performance is around 20% more efficient than Nvidia's GP100 numbers, and that's solely based on Nvidia as a benchmark so there's no "if you do this title/that title" like comparison.
Which is obviously great for AMD, but that's assuming there's actually a 980ti like performance at 150 watts. If it's 175 watts then it's much closer to parity for the two. And the January thing was just a tease, so it definitely seems to me that Nvidia is eager for whatever reason to put out all the information they have on their new cards, while AMD is, for whatever reason, more content to wait/tease little by little.
The hardware.fr numbers don't use canned 150W or 175W numbers. They use measured power. So let's ignore marketing slogans for a change?Which is obviously great for AMD, but that's assuming there's actually a 980ti like performance at 150 watts. If it's 175 watts then it's much closer to parity for the two.
It was a tease with power numbers, performance suggestions and comparisons with Nvidia GPUs. I call that the start of a drawn out paper launch. YMMV.And the January thing was just a tease, so it definitely seems to me that Nvidia is eager for whatever reason to put out all the information they have on their new cards, while AMD is, for whatever reason, more content to wait/tease little by little.
Well look at it this way, AMD would have used base case to "market" their hardware. if best case is 2.5 times perf per watt. They may have gotten close to Maxwell 2 in this department (they are definitly using Hawaii as their base line not Fury since its got HBM if they were it won't be 2.5x thats for sure if we normalize it by removing the additional % gain from HBM)