AMD: Speculation, Rumors, and Discussion (Archive)

Status
Not open for further replies.
And here I am, under the impression that at least this forum would be a bit beyond simple repitition of marketing terms. :|
edit: referring to a couple of posts a page or so further back
 
It was never their intention to cede anything. I don't think any company would ever do that in any market, cede a market segment if at all possible. Yeah timelines can sift a little bit so there will be times where the others can take advantage. Just like what Geeforcer pointed out, the 2xx line for nV, it was a case of stretching a particular architecture for too long and giving the chance for its competition to catch up. I don't think nV will let something like that happen if at all possible. If it happens then they screwed up somewhere.

If a company has a certain advantage, its up to them not to further that advantage, if they don't they are doing something wrong on their end or they took the wrong direction to begin with and are trying to fix it.

Well I said temporarily. As an aside, companies reposition themselves all the time, of the innumerable examples a few that come to mind are that, in the last decade, AMD went fabless, sold their mobile handset division, and quit the dense microserver business. Anyway, the upshot of my post was simply that AMD does not appear to be competing with GP104 until Vega; hence, temporarily ceding the high end. The reason I used the language I did is because it's rather unusual for a GPU company to intentionally position themselves as cheap and cheerful, which is how I read Taylor's statement. He basically said, "Pascal is caviar for fancy men, come on by the AMD buffet and get yourself a chicken fried Polaris." Now maybe that's just what you say when your big chips aren't done yet, but typical GPU vendor PR is, "ours is the best, if not now then very soon."
 
Now maybe that's just what you say when your big chips aren't done yet, but typical GPU vendor PR is, "ours is the best, if not now then very soon."
How well would that work, e.g. when I told you that GP20x is due for Q1 2017 (fictive example!)?
Correct, the sales of the GP10x series would drop dead, even if the second revision wasn't expected to be much of an improvement in actual performance.

AMD is in a strange position. They want to sell of the remaining Fiji chips, but everyone is already waiting for Vega, and that even before Polaris is officially released. Even if they HAD released a big Polaris now, it would only have worsened the situation as Vega would then had deprecated not only one, but two designs.
For the very same reason I also doubt that AMD will release any dedicated low end chips based on the Vega arch until Navi, even if they could. Not with Navi already scheduled for 2018, and Polaris just being about to be released.

I seriously doubt that Polaris was supposed to become a mere gap filler as it does now. The schedule looks just plain wrong, respectively too compressed.
 
How well would that work, e.g. when I told you that GP20x is due for Q1 2017 (fictive example!)?
Correct, the sales of the GP10x series would drop dead, even if the second revision wasn't expected to be much of an improvement in actual performance.

AMD is in a strange position. They want to sell of the remaining Fiji chips, but everyone is already waiting for Vega, and that even before Polaris is officially released. Even if they HAD released a big Polaris now, it would only have worsened the situation as Vega would then had deprecated not only one, but two designs.
For the very same reason I also doubt that AMD will release any dedicated low end chips based on the Vega arch until Navi, even if they could. Not with Navi already scheduled for 2018, and Polaris just being about to be released.

I seriously doubt that Polaris was supposed to become a mere gap filler as it does now. The schedule looks just plain wrong, respectively too compressed.

A very strange position. I suppose it's the nature of having less resources than a competitor that causes reactionary moves and throws off their cadence. It is I think becoming reminiscent of the battle with Intel, where they fight every battle with half an army, thereby never having much chance to win one. I'm clearly digressing into execution gloom country...

On the topic of HW speculation, do you think Vega will be distinct enough as an architecture to even merit replacing Polaris in it's segment? I think we're going to continue to see incremental improvements, as ever with GCN. So in that sense Vega will be big Polaris + tweaks.
 
Yes, it is. At least that's one possible implementation, as it technically doesn't need a geometry of any sort.
Well it's up to developers to implement what ever they want. And yes for some scenarios compute shader will be enough. However I don't think you'll be able to get by most cases without redrawing anything. How about objects that are really close to the player such as weapon or cockpit? How about GUI elements such as health bar, cross hair?
 
With GP100 having fine-grained compute preemption, there's a good chance that graphics preemption will be present as well. If so, then no fundamental benefit here either. So let's defer this one for later...
This does seem possible, although if AMD is to serve as an example it can take longer to get the graphics pipeline and its larger amount of context amenable to preemption.
Kaveri's architectural level had the ability to preempt compute kernels, but it was not until Carrizo that GCN received graphics preemption.
 
What specifically does compute have to do with VR that it doesn't have with the rest of rendering in modern games? Rendering stuff is graphics task, async time warp for VR is graphics task.
True, but the timewarp is somewhat time sensitive and compute should complete more quickly as you wouldn't need to worry about geometry. AMD also favored preemption with compute over graphics. How Nvidia implemented it prior to Pascal I'm not exactly sure.

On the topic of HW speculation, do you think Vega will be distinct enough as an architecture to even merit replacing Polaris in it's segment? I think we're going to continue to see incremental improvements, as ever with GCN. So in that sense Vega will be big Polaris + tweaks.
Looking at the roadmap it doesn't seem that distinct. HBM2 added, likely due to availability and only making sense on higher end parts. The rest is likely more of their fine grained power gating that didn't make the cut for Polaris. Maybe added support for additional display outputs VR might use. So yeah, Vega looks like a big tweaked Polaris, unless some of those patents didn't meet the deadline.
 
Although there is precedent for a new architecture to occupy the top of a Radeon product stack, while an minor evolution of the previous architecture occupied the lower positions. (Reference being to the 6800 and 6900 series, introduced a couple of months apart).

AMD might have reduced the software-visible changes in Polaris so that it would be suitable for the backwards-compatible PS4 'Neo', while Vega could have been built without those restrictions.
 
As long as compute can preempt graphics, that's already all which is needed.
But isn't compute preemption graphics the hardest kind of preemption? Because there's a lot of state in the non-shader pipeline that needs to be take care of one way or the other?
 
But isn't compute preemption graphics the hardest kind of preemption? Because there's a lot of state in the non-shader pipeline that needs to be take care of one way or the other?
Hrm, not if we assume that the preempted warps are guaranteed to be re-instantiated on the very same SMM, and if we allow the entire graphic pipeline to stall, in whole, retaining the state. And given that Nvidia has only announced preemption of graphics in favor of compute, without loosing a single word about rescheduling any workload, that sounds plausible enough to me. Yes, it also sound like a band-aid fix, but I somewhat doubt Nvidia would provide more than that.
 
Hrm, not if we assume that the preempted warps are guaranteed to be re-instantiated on the very same SMM, and if we allow the entire graphic pipeline to stall, in whole, retaining the state. And given that Nvidia has only announced preemption of graphics in favor of compute, without loosing a single word about rescheduling any workload, that sounds plausible enough to me. Yes, it also sound like a band-aid fix, but I somewhat doubt Nvidia would provide more than that.

I think it might be undesirable to implement that kind of half-measure. At least from the point of view of the OS, we have a perfectly happy graphics context trundling along, reading commands, generating and responding to CPU interrupts suddenly going unresponsive in favor of a separate compute kernel whose visibility and timing behavior with regards to the OS is unclear.

If the graphics context and pipeline can be truly frozen, which means in-flight data will reach some kind of storage location, then that frozen data is in a position to be context-switched. Potentially, having that pipeline state fully characterized means that a parallel path could be set up to allow something to run concurrently.

Just freezing it means that if the OS is getting antsy about GPU responsiveness, the graphics pipeline might cause the OS to restart the device.
 
Just freezing it means that if the OS is getting antsy about GPU responsiveness, the graphics pipeline might cause the OS to restart the device.
Yes, I'm suspecting that as well. So using the high priority context would be highly volatile.
But it's still an improvement over Maxwell, were a high priority compute context would still need to wait for a SM unit to ramp down entirely first, at minimum finishing all active draw calls, and hence also having the entire graphics pipeline running empty, plus an indeterministic latency. Freezing the pipeline is definitely better than draining it.
 
Yes, I'm suspecting that as well. So using the high priority context would be highly volatile.
But it's still an improvement over Maxwell, were a high priority compute context would still need to wait for a SM unit to ramp down entirely first, at minimum finishing all active draw calls, and hence also having the entire graphics pipeline running empty, plus an indeterministic latency. Freezing the pipeline is definitely better than draining it.
I meant varying levels of the OS blanking out the screen, killing the application, restarting the driver, or possibly a hard system crash.
The legacy of that context and hardware being large and integrated with the functions of the GPU as a system device might explain why it took longer to get graphics preemption right.

There's two somewhat different use cases of preemption being used. Graphics preemption came up in the discussion of Kaveri and Carizzo's HSA driver implementation, in terms of a possible DoS of the graphics path--which would freak out the driver/OS.
For compute, it was important as well because AMD's HSA ambitions included very long kernel run times--which could freak out the driver/OS.
Nvidia's Pascal paper for its compute preemption mention that, as well as debugging.
Self-contained compute kernel wavefronts can preempt and it applies more readily to the compute path since it's compartmentalized.
Preempting a graphics wavefront still leaves this global Graphics entity that persists.

From that point of view, just making the GPU stop responding while a compute shader pinkie-swears it will be done before the OS kills the application seems risky.
 
What needs to happen to the non-shader pipeline state?
If you want to interrupt the rendering of a huge triangle, you need to save somewhere which parts have already been rendered and which have not. Similarly, you'd need to save the configuration of ROP blenders. Etc.
 
regarding preemption, from Nvidia Pascal whitepaper page 9:
Compute Preemption is another important new hardware and software feature added to GP100 that allows compute tasks to be preempted at instruction-level granularity, rather than thread block granularity as in prior Maxwell and Kepler GPU architectures. Compute Preemption prevents long-running applications from either monopolizing the system (preventing other applications from running) or timing out. Programmers no longer need to modify their long-running applications to play nicely with other GPU applications. With Compute Preemption in GP100, applications can run as long as needed to process large datasets or wait for various conditions to occur, while scheduled alongside other tasks. For example, both interactive graphics tasks and interactive debuggers can run in concert with long-running compute tasks
then page 30:
The new Pascal GP100 Compute Preemption feature allows compute tasks running on the GPU to be interrupted at instruction-level granularity, and their context swapped to GPU DRAM. This permits other applications to be swapped in and run, followed by the original task’s context being swapped back in to continue execution where it left off.
Compute Preemption solves the important problem of long-running or ill-behaved applications that can monopolize a system, causing the system to become unresponsive while it waits for the task to complete, possibly resulting in the task timing out and/or being killed by the OS or CUDA driver. Before Pascal, on systems where compute and display tasks were run on the same GPU, long-running compute kernels could cause the OS and other visual applications to become unresponsive and non-interactive until the kernel timed out. Because of this, programmers had to either install a dedicated compute-only GPU or carefully code their applications around the limitations of prior GPUs, breaking up their workloads into smaller execution timeslices so they would not time out or be killed by the OS.
Indeed, many applications do require long-running processes, and with Compute Preemption in GP100, those applications can now run as long as they need when processing large datasets or waiting for specific conditions to occur, while visual applications remain smooth and interactive—but not at the expense of the programmer struggling to get code to run in small timeslices.
Compute Preemption also permits interactive debugging of compute kernels on single-GPU systems. This is an important capability for developer productivity. In contrast, the Kepler GPU architecture only provided coarser-grained preemption at the level of a block of threads in a compute kernel. This block-level preemption required that all threads of a thread block complete before the hardware can context switch to a different context. However when using a debugger and a GPU breakpoint was hit on an instruction within the thread block, the thread block was not complete, preventing block-level preemption. While Kepler and Maxwell were still able to provide the core functionality of a debugger by adding instrumentation during the compilation process, GP100 is able to support a more robust and lightweight debugger implementation.
 
If you want to interrupt the rendering of a huge triangle, you need to save somewhere which parts have already been rendered and which have not. Similarly, you'd need to save the configuration of ROP blenders. Etc.
A compute shader doesn't interact with that stuff though.

https://community.amd.com/community/gaming/blog/2016/03/28/asynchronous-shaders-evolved

I can't see any timings there, nor a precise description of what happens when preemption occurs. It appears that work is left to drain out of the shaders, before the new task takes over.
 
It's possibly not just the shaders in the preemption case, since the compute submission isn't allowed to ramp until the graphics portion goes to zero.
The command processor has run ahead an unknown number of commands in the queues, in-flight state changes, the order of them, barriers, and pending messages to and from the rest of the system need to be resolved.
Graphics state rollovers have a similar drain, but not necessarily every change does that. I wonder how that interacts with a graphics preemption, or if the state changes that significant are the ones that restrict concurrency among graphics wavefronts in the first place.

I'm curious if the quick response queue can really guarantee that kind of resource ramp, or the floor value for graphics utilization. Does having that mode on mean the driver or GPU is quietly reserving wavefront slots away from one function or the other to make sure they are available?
 
Status
Not open for further replies.
Back
Top