HSA Discussion thread

Hi Guys - I'm James Prior, an AMD employee. I am creating this thread in response to Andrew Laurizten's request for a space to discuss more about HSA. I do not work for the HSA Foundation, and am not AMD's spokesperson for HSA; I'm trying to enable discussion for clarity about what HSA is.

The website is here - http://www.hsafoundation.com/

"Heterogeneous System Architecture (HSA) Foundation is a not-for-profit industry standards body focused on making it dramatically easier to program heterogeneous computing devices. The consortium comprises various software vendors, IP providers, and academic institutions and develops royalty-free standards and open-source software.

The HSA Foundation members are building a heterogeneous compute software ecosystem built on open, royalty-free industry standards and open-source software: the HSA runtimes and compilation tools are based on open-source technologies such as LLVM and GCC.

The HSA Foundation seeks to create applications that seamlessly blend scalar processing on the CPU, parallel processing on the GPU, and optimized processing on the DSP via high bandwidth shared memory access enabling greater application performance at low power consumption. The HSA Foundation is defining key interfaces for parallel computation utilizing CPUs, GPUs, DSPs, and other programmable and fixed-function devices, thus supporting a diverse set of high-level programming languages and creating the next generation in general-purpose computing."

There are two AMD chips with HSA features; "Kaveri" (hUMA, hQ) and "Carrizo" (HSA 1.0 specification).

Disclaimer
I can't speak to products from other HSA Foundation member companies.
I will freely admit to not being a programming expert! So I may go ask other people to find information to answer questions, which may introduce delays in response :)

Let the discussion begin!
 
Thanks for starting the thread James and welcome to Beyond3D!

So after much confusion over what exactly HSA is in the press for the past couple years, the spec finally came out and after skimming it I'm still left with a fair amount of confusion/questions/criticisms. Let me get a few high level things out to start the discussion.

A good place of commonality to start I think is in terms of hardware requirements, here's a relevant HSA page:
http://www.hsafoundation.com/html/H...h/Topics/01_Overview/list_of_requirements.htm

Now most folks would look at those things and nod and say "yeah this is all reasonable stuff that we've all been working towards for at least the last 5+ years". Obviously people eventually want shared virtual memory... even from the earliest days of GPGPU that was known, it's just a question of when it becomes viable in hardware. With UMA architectures it's a no-brainer, although it obviously still costs hardware. Similarly you want the ability to be cache coherent, you obviously want atomics, any reasonable implementation on top of a multitasking OS needs preemption at some decent granularity, etc. So sure, while folks might prioritize differently no one that I know would fundamentally disagree with any of this stuff, and it's all stuff we've been iterating on for a long time. ex. WDDM has had requirements around a lot of these things from the start that are gradually made more stringent and capable as time passes... I can only assume the same is true for other operating systems.

That's the real point though - a lot of the hardware requirements stuff here interacts intractably with the OS. Sure you can play some games in software and drivers, but ultimately the operating system owns your cores, memory and scheduling. For instance, it doesn't matter whether or not HSA says you need to be able to preempt, you can't be a driver on most sensible operating systems without that.

Now in the defense of HSA 1.0, it does have some stricter requirements than current OSes - which is fine - but it doesn't really change my criticism: this is all stuff that is really up to the operating system and will eventually be required there as well. You can't do an end run around the OS on any of this - it all has to live on top of the operating system in the end, which makes the concept of a "system architecture" sort of suspect. There's some nods to this in the spec related to "platforms that implement HSA profile XYZ".

So really while HSA hardware requirements are all well and good, I don't see how they add anything to the ecosystem. We were all already moving towards this stuff before HSA ever came along and while assembling a list outside of the context of a given platform has some value, it's hardly revolutionary.

Now to get a more pointed criticism... HSA basically took the CUDA/OpenCL/DirectCompute execution model and all of its flaws and baked it into a system architecture specification! Yep, that includes defining bit patterns for the equivalent of clEnqueueNDRangeBlahBlah:
http://www.hsafoundation.com/html/H.../Topics/02_Details/kernel_dispatch_packet.htm

Many folks agree at this point that certain things like multi-dimensional dispatch and arguably explicit scratch pad shared local memory probably need to be changed. (I'd argue the execution model itself is clearly insufficient given that even 3 years ago we knew that the best way to write code was to entirely bypass it.) At a SIGGRAPH panel last year there was broad agreement among IHVs to this end. Yet hilariously, these mistakes/old execution model stuff is fundamentally baked into the system architecture of HSA itself. Even if you think the compute execution model as it is today is completely sufficient (in which case I'd argue that you simply haven't written very much high performance code), I should think that most people can see why a "system architecture specification" should not be spelling out stuff like this.

Anyways as this post is already getting long and ranty (I saved the best for last ;)) I'll stop here and let others chime in. Suffice it to say, I don't think anyone would really debate the hardware requirements direction of the specification, but I don't see how HSA changes anything that we've already been doing on that front. From a rough skim it seems like Broadwell probably conforms to most of the requirements in that list... so is it an "HSA platform" already? Or does it at least have "HSA features" I guess? There is still an OS and drivers involved so the rest of the bit pattern stuff is basically irrelevant - any driver could convert that to the relevant hardware commands today.

Fundamentally there will always be at least an OS and probably a thin driver involved to do any scheduling of these resources because as I mentioned, the OS owns all this stuff.
 
Last edited:
When it was first announced, wasn't HSA really about GPU vendors correctly identifying that building their own CUDA-class GPU compute toolchains from scratch was going to be a massive ($$$) effort?

Here's the timeline:
  • June 2012: The HSA panel got to work on defining, "... cross-platform, cross-OS means of ensuring maximum heterogeneous application portability and optimization."
  • Sept 2012: LLVM/Clang takes off everywhere and an OpenCL implementation shortcut was identified ... SPIR.
  • 2013-2014: HSA still working
  • Mar 2015: Compute-focused HSA 1.0 and its PTX'ish HSAIL are announced
  • Mar 2015: Compute+graphics SPIR-V announced by Khronos.
Meanwhile, CUDA went from 4.2>5.0>5.5>6.0>6.5>7.0.

Time flies, architectures evolve and building toolchains remains expensive.
 
FWIW, I agree with Andrew in general. I believe that what AMD dearly needed for the HSA effort, and did not manage to genuinely get on board, was precisely a big software ecosystem player i.e. someone who owns an OS and who knows how to build software and toolchains that developers use. And by developers I don't mean students or professors looking for the next publication or hobbyists that want to blog. IMHO, it's become pretty apparent that IHV expertise in what regards programming models is dubious / hugely biased towards a particular model, i.e. the CUDA one (and HSA did not, unfortunately, achieve escape velocity).

This is pretty cute when you're rolling GEMM, SAXPY or some glorified form of pixel shading, but it becomes less fun when you start looking at non-rigid scheduling (e.g. work-stealing), non-trivially partitionable, asymmetrical workloads. It's quite sad once you consider non-trivial software infrastructures that end up involving non-owned, potentially opaque libraries, where someone might just have thrown in a barrier or 10 just to piss you off when calling some function in your code and trying to branch around it (for example). Finally, there's the lack of longer term planning: currently there's a lot of rah-rah about SVM, which basically throws the baby out with the bath-water somewhat as it appears nobody cares about distributed memory scenarios anymore, although those are valid and interesting in and of themselves. But I'll give HSA its due: it's more honest about counting cores and not treating GPUs as magical versus e.g. CPUs, that's nice.
 
As I've pointed out numerous times ... seems like AMD have been working on the HSA windows drivers, im guessing for Windows 10 (threshold), since 2011 ...

Hopefully we'll see the full end to end system come win10 launch ?!


CGn_F7aUcAEfjMa.png:large
 
That's the real point though - a lot of the hardware requirements stuff here interacts intractably with the OS. Sure you can play some games in software and drivers, but ultimately the operating system owns your cores, memory and scheduling.
I will have to nibble around the periphery of this topic until I can collect my thoughts on it. It's a quite wide range of concerns and objectives.
Some questions arise at this point.
What is meant by the OS?
Which OS?
The Windows 7 installation on a PC. The first, second, or third OS on the Xbox One?
The secure domain running on some chips, the TPM-enabled system monitor's storage, the host VM, a low-level firmware for a Thing on the Internet Of Things (*wince*)?
Given the range of vendors HSA attempts to entice, there is a lot of variation on just what these operating systems tolerate, but which need to be abstracted away. How many of these operating systems provide mechanisms for interoperability, or have a desire to even attempt to provide it?
I thought HSA served as a virtual execution context, a virtual ISA and basic execution space that would allow a programmer a way to tractably reason through an implementation without knowing a lot of system particulars, some of which many operating systems would not care to deal with.

What those things are is partly along the lines of my questions to the claim of ownership over cores, memory, and scheduling.
Does the OS own the DRAM buffer on my SSD?
To what extent did it own the frame buffer of older video cards when the aperture was elsewhere?
In the case of Transmeta and Nvidia's code-morphing CPUs, what of their translation caches in DRAM?
The scratchpads of innumerable licensed DSP cores, VLIW doodads, offload engines, system processors, etc.

There are a number of special cases I can think of to core ownership, or the specifics of it. (Besides virtualization, containers, or other cases where we need to ask which OS is the OS.)
Cell SPEs could be dropped early in the boot process into a secure mode that almost nothing could interact with. There are cases of mobile devices with shadow cores that step in at times of low activity. Perhaps all of them use a software driver, but it comes to power management, idling and gating cores, hardware or an on-chip firmware engine can make a lot of calls without trekking over to the OS for anything beyond general policy (which some can ignore).
Multiple x86 SOCs now have power states that do more than the OS is lead to believe, although whether that is a loss of ownership or a just a longer leash is a point of debate.

As far as scheduling goes, in light of the various secure domains, low-level sleight of hand, and existing shader engines:
Perhaps it should be owned by the OS, whichever one that is, but we have examples where that is at least partly not the case.

Separate from HSA, looking at the interest in securing platforms, some of these parties may not trust the OS enough to give it the same visibility or ownership over resources as others. Power management has nibbled away at the periphery as well.


My weak understanding of HSA is that it tries to appeal to platforms and vendors that can differ on a lot of this, as plausible a goal as that may or may not be.

An HSA scenario can have a program whose execution spans an x86 core, an embedded processor with a DMA-managed scratchpad, an ARM processor, a GPU shader running its own run list, a VLIW core with a legacy in low-level embedded.
They do not agree on what order their writes become visible, they do not have the same ISA, they do not have the same virtual memory handling, the same coherence protocols, signaling methods, interrupt methods, OS visibility, varying numbers of memory spaces, etc. Absent some of the aforementioned HSA requirements, they wouldn't have the ability to agree on what time something happened.
However, this program wants to run with much of this abstracted as if it were a regular user space with well-behaved memory, without invoking privileged operations or relying on OS facilities that might not be there in the form expected, because we don't know which OS it is and we may not know which x86, embedded core, random processor it is.
A lot of the quirks like the very relaxed consistency model and various atomics and timers may point to how widely varying in quality and sophistication the players may be.

Why? Perhaps those quirky cores have something useful that would be a bit more useful if they could readily be peers for certain parts of the program.
Or some of those vendors have kind of crummy IP, or IP they want to leverage, and they don't want to validate that part of the protocol over every OS and host vendor. Or they don't want to give up certain conventions or use a specific OS.
Or they don't want to become dependent on a specific x86 vendor.
Or they are a specific x86 vendor and they don't know how long they can keep being an x86 vendor.
Or, drugs, maybe.


For instance, it doesn't matter whether or not HSA says you need to be able to preempt, you can't be a driver on most sensible operating systems without that.
What interests me is that this may be a rather late addition, perhaps in part because AMD realized its attempted HSA standard bearer in Kaveri could not manage it.

Now in the defense of HSA 1.0, it does have some stricter requirements than current OSes - which is fine - but it doesn't really change my criticism: this is all stuff that is really up to the operating system and will eventually be required there as well. You can't do an end run around the OS on any of this - it all has to live on top of the operating system in the end, which makes the concept of a "system architecture" sort of suspect. There's some nods to this in the spec related to "platforms that implement HSA profile XYZ".
Getting low-level behaviors on a homogenous host system consistent can be challenging enough. (Going by Linus Torvalds rants on debugging weakly-ordered systems on RWT as an example.)
Stipulating for now that it is worth the effort to provide a platform upon which processors that cannot agree under normal circumstances on how to spinlock can accomplish work:
Should it really be on the OS vendors to figure out this mess across a universe of implementations?
What if they agreed on a common method, hardware and software vendors.
Like, what if the OSes agreed on providing a userspace queueing facility?
Then they'd probably need to agree on a lingua franca since they have so many ISAs.
But their atomics, LL/SC, CAS, methods don't agree.

So really while HSA hardware requirements are all well and good, I don't see how they add anything to the ecosystem. We were all already moving towards this stuff before HSA ever came along and while assembling a list outside of the context of a given platform has some value, it's hardly revolutionary.
Some of that may depend on who We is.
Progress is not uniform, and a lot of players are aware of the We they stand to lose to.

Much of this is more about what I think the motivations may be.
I do agree that the edifice being built may have inherited too much from the obvious APU basis that the biggest pusher used as its inspiration ofand may be too overly accommodating due to the widely varying quality of implementations.
Its premise is to make things that traditionally could not be peers to CPUs become equals of a sort, at least in user space.
This does assert that many of these items deserve even some of the legitimacy of CPU host platforms, which I will just leave as an item of debate.

Perhaps I can try to mull over some of the architectural points, and where it might not appeal to my whimsy. Too sane, perhaps, for what it aims to do?
Maybe in a later post to save length here.
 
Absent some of the aforementioned HSA requirements, they wouldn't have the ability to agree on what time something happened.
Sure, but that's my point - every sensible operating system is going to have similar requirements because otherwise it would be impossible to write code to start with. HSA brings nothing additional to the table here beyond - what - putting a stamp of approval on an operating system itself or something?

However, this program wants to run with much of this abstracted as if it were a regular user space with well-behaved memory, without invoking privileged operations or relying on OS facilities that might not be there in the form expected, because we don't know which OS it is and we may not know which x86, embedded core, random processor it is.
Fine, so it's OpenCL/middleware (which it increasingly sounds like it is). If they'd just call it that I'd nod and move on :) There are still questions around who provides drivers/SDK/etc. but ultimately that doesn't change that fact.

Why? Perhaps those quirky cores have something useful that would be a bit more useful if they could readily be peers for certain parts of the program.
No one is arguing against the fundamental utility of heterogeneous compute, fixed function hardware, etc. I'm certainly not even arguing against having a sane and semi-unified way to schedule work across varying hardware and so on. I just don't understand how HSA in its current form plans to help with any of that and it seems to have gotten bogged down in minutia long before they have answered the simple questions of what it is and how one uses it.

As per the above, I expect the answer is that in practice it's just the OpenCL model with at best some sort of pluggable ICD mechanism and maybe an HSA-provided SDK. Great, ship it on a few OSes and let people play with it. But that's certainly not how they have been positioning it in their grandiose spec publishing and marketing.

Or some of those vendors have kind of crummy IP, or IP they want to leverage, and they don't want to validate that part of the protocol over every OS and host vendor. Or they don't want to give up certain conventions or use a specific OS.
This part is completely unclear... you're not going to write a driver for a given platform so... who is exactly? This is the bit that makes no sense in the whole HSA argument - if you don't have a platform vendor onboard then someone is going to have to write a driver. Ultimately I don't buy the logic though - everyone is always going to want a thin layer of software in there at least; as you have clearly pointed out, we're a hell of a long way from something well-defined enough to nail down as an official hardware-consumed ISA.

Should it really be on the OS vendors to figure out this mess across a universe of implementations?
Yes, because they fundamentally have to sort it out to start with, and have been for many years. Again, HSA doesn't add any value here at all as it by necessity lives *on top of* the OS mechanisms that must exist to start with.

Much of this is more about what I think the motivations may be.
Oh the motivations are pretty clear I think, and I don't think they are nefarious. I just think the whole thing sort of hinges on having a platform or two onboard and since they don't appear to have gotten that yet it's not really much different from OpenCL or any other similar API in practice.
 
This is pretty cute when you're rolling GEMM, SAXPY or some glorified form of pixel shading, but it becomes less fun when you start looking at non-rigid scheduling (e.g. work-stealing), non-trivially partitionable, asymmetrical workloads. It's quite sad once you consider non-trivial software infrastructures that end up involving non-owned, potentially opaque libraries, where someone might just have thrown in a barrier or 10 just to piss you off when calling some function in your code and trying to branch around it (for example).
Yeah the whole model is almost completely intractable beyond the pixel-shader-like kernel cases. Nesting and separate compilation are kind of fundamental and yet the current language execution models can't cleanly support them without several performance cliffs (to some extent hardware too, but not as crippling as the software currently).

Frankly it should have been a red flag to everyone 5+ years ago when we started realizing that persistent threads/warp scheduled type code is the only way to get efficiency on non-trivial kernels. (Check the SIGGRAPH talks, etc; this has all been well known for a while now.) But instead of taking a step back to re-evaluate the original CUDA design choices, people just kept blindly copying the same mistakes over and over in new "specs" and APIs.

On that point I can't criticize HSA specifically any more than any other language (OpenCL, DX compute, etc.) as they all did the same thing. But the notion that the execution model is strong enough to bake into a "system architecture specification" that we're going to use for decades to come is laughable.
 
Sure, but that's my point - every sensible operating system is going to have similar requirements because otherwise it would be impossible to write code to start with. HSA brings nothing additional to the table here beyond - what - putting a stamp of approval on an operating system itself or something?
This goes to what we're assuming the background is for a number of the players HSA is attempting to appeal to, and how sensible their environments are.
Perhaps we can consider it a backdoor codification of AMD's TSC as a baseline, without necessarily depending on the OS to provide it in scenarios where the OS is on the sidelines or has disabled the function.
And it's not the case that this sensible solution is a universal, so it at least prevents an HSA solution built on a system like an older AMD or Intel P3, unless the runtime can discover an alternate source that can update at a tic rate that conveniently bounds AMD's timekeeping mechanism.

So your point is correct that sensible systems should have a hardware source and a software service that consistently reveals it, and HSA is defining it because its broad umbrella includes the less than sensible in both hardware and software dimensions. Even if there are sensible independent solutions, why would we assume they agree?

As an example, up until recently a GPU could not be trusted with a ball of string, much less be let out of its isolated state where it could get anywhere near anything important. That this is not unique in the realm of clients HSA tries to cater to will be a refrain I will use heavily.

Fine, so it's OpenCL/middleware (which it increasingly sounds like it is).
A significant portion of its abstraction aligns with that.
It's a form of middleware that seeks to get a default level of access below that of most middleware, hence why HSA at least originally posited an HSA layer below an OpenCL implementation. That was before the most recent OpenCL revisions that added some of the features HSA mandates.

The specifics, like the formats and defined methods go to a middleware of sorts whose functions can serve as a hardware accelleration target.
Probably not coincidental: the most complex packets are bit-defined to be the length of an AMD cache line, the launch dimensions align so well with an extant heterogeneous compute device, the specification leaves room for hardware implementation of the base signal specification (and guess who happens to have a hardware solution for that), and dedicated hardware queues for what is ostensibly in memory, there are image objects, etc.

The premise of modifying the underlying in hardware can allow for a very bare bones view of what is going on to the OS, which they may or may not want involved that much.
Not every potential stakeholder is used to working at the level where they understand the pitfalls of working out a protocol of this sort, or with any number of other IP providers that mind wind up in an arbitrary custom SOC.
And "ball of string" may apply in some cases.

Great, ship it on a few OSes and let people play with it. But that's certainly not how they have been positioning it in their grandiose spec publishing and marketing.
Spec publishing goes to getting an advanced understanding of interoperation at a system level when many lack the resources or will to coordinate on an ad hoc basis.
And the ball of string.
Marketing goes to it being marketing.

This part is completely unclear... you're not going to write a driver for a given platform so... who is exactly? This is the bit that makes no sense in the whole HSA argument - if you don't have a platform vendor onboard then someone is going to have to write a driver.

There would at least be one driver, I would imagine.
The finalizer would be something akin to a GPU shader compiler, when it isn't actually a GPU shader compiler.
Where it supposedly comes in is the queuing system that does its best to be low-overhead by not involving the OS.
The base 1.0 version of an HSA client's software and hardware would be necessary.
Changing the compute "meat" of a client can be divorced from having to revalidate the interoperability and interactions with system memory.
In some cases, it might be with the addition of resources a custom compute client has no particular impact to the OS, which may simplify testing by moving some elements to a more standardized low-level interface and the particulars in the finalizer and agent software as "someone else's problem".

There also wouldn't need to be an explosion of Vendor A interop with Vendor Z driver combos when it comes to making sure they queue things correctly, and since there are hardware-accelerated variants, they can be more readily plugged into configurations--or kept the same as other blocks change--as long as they follow the rules of the road.
At least one vendor has an interest in this.

If a provider for some reason has trouble meeting the spec, I imagine there's a vendor that would take money to help it along.

Yes, because they fundamentally have to sort it out to start with, and have been for many years.
For many years it was not a safe assumption to make that they had sorted out a lot of these things, and given the broad umbrella of HSA, it is not a universal.

Again, HSA doesn't add any value here at all as it by necessity lives *on top of* the OS mechanisms that must exist to start with.
There are various escape clauses for proprietary implementations that are fine if they conform to the behaviors outlined. Not everything is necessarily going back to the OS

Oh the motivations are pretty clear I think, and I don't think they are nefarious.
Trying to stay afloat is understandable.

I just think the whole thing sort of hinges on having a platform or two onboard and since they don't appear to have gotten that yet it's not really much different from OpenCL or any other similar API in practice.
It's no different from every other AMD initiative that is not taken up by a platform in that case.

There's still value to debate it outside of adoption, at least in an academic sense.
It's not like SIMD isn't worth discussing because 3DNow! fizzled.
 
Last edited:
As an example, up until recently a GPU could not be trusted with a ball of string, much less be let out of its isolated state where it could get anywhere near anything important. That this is not unique in the realm of clients HSA tries to cater to will be a refrain I will use heavily.
Sure, but noting about HSA changes that. At best it's an aspirational document about what we want to see on a wide range of hardware *and operating systems*. For that purpose - if you substract the execution model/OpenCL stuff that I complained about above - it's fine. But that's not a "spec".

Probably not coincidental: the most complex packets are bit-defined to be the length of an AMD cache line, the launch dimensions align so well with an extant heterogeneous compute device, the specification leaves room for hardware implementation of the base signal specification (and guess who happens to have a hardware solution for that), and dedicated hardware queues for what is ostensibly in memory, there are image objects, etc.
Right, but this is precisely where it gets completely into the weeds for me. All of that stuff is utterly irrelevant details compared to the fundamental model of how one writes software that runs on a platform that eventually gets run on the various compute devices. I hate to go back to my previous point but nothing here convinces me that you can do an end-run around the platform and get a useful result.

The premise of modifying the underlying in hardware can allow for a very bare bones view of what is going on to the OS, which they may or may not want involved that much.
I'm sure they don't want it involved at all, but my argument is that's naĂŻve and silly :) Game developers don't want an OS and multitasking either, but users kind of do.

There are various escape clauses for proprietary implementations that are fine if they conform to the behaviors outlined. Not everything is necessarily going back to the OS
For something like a pure compute "coprocessor" (Xeon Phi or something) I agree. But let's get real here - the main use case is for CPU+GPU and that's highly problematic if you also want your CPU to be running an OS and the GPU to be drawing things on the screen. Unless you're arguing that you should sequester off a portion of the GPU from the OS which is where it gets back to "totally silly" territory.

I appreciate your points here and I agree that they are likely in-line with the thinking of the folks involved with HSA, but my opinion hasn't really changed here. The practical realization of these aspirations is what matters and without a platform holder to fill in the other 90% of the work and details, HSA by itself simply isn't anything new or compelling.

... and they hard-coded a busted execution model but I've ranted on that above already ;)
 
Objection! Not true. Game developers are the only ones that want real multi-tasking (concurrency), users only want the sense of it. :p
Haha, well you can never guarantee that two things run at the same time of course.

But yeah my point was more that games like to assume that they own the whole system and that the user is never going to be doing anything else at the same time, etc. A few of the bigger PC studios have started to catch on to the fact that people have multiple monitors, stream video, do VOIP, browse the web and so on more often than not these days, but folks coming from the console side typically like to assume that the notion that they have to share the machine/resources is something they have to work around rather than embrace :)

As a developer, I highly prefer to just assume I own the machine too. But as a PC gamer/user, it's unacceptable to me when applications do that ;)
 
Last edited:
Sure, but noting about HSA changes that. At best it's an aspirational document about what we want to see on a wide range of hardware *and operating systems*.
I saw it more as a lower bound on what would be considered competent enough to even show up to the party, which sometimes does not have much daylight between the document and AMD's architectural direction.

For that purpose - if you substract the execution model/OpenCL stuff that I complained about above - it's fine. But that's not a "spec".
At its inception, OpenCL was more constrained, so HSA offered flexibility. OpenCL evolved in the years hence.
HSA or was it Fusion was introduced when, 2011 or so? OpenCL got shared virtual memory in late 2013.
The difference was more notable years ago.

Right, but this is precisely where it gets completely into the weeds for me. All of that stuff is utterly irrelevant details compared to the fundamental model of how one writes software that runs on a platform that eventually gets run on the various compute devices.
I am not sure I disagree with that point. Perhaps AMD can get back and explain otherwise, but the primary distinctions I can draw between the middleware/OpenCL items you see no future for are essentially based on hardware/runtime implementation details.
It's not so much changing how software is written, but more that there is a range of compute agents that absent HSA would never be permitted to run it.

Uncharitably, I would say it's not really about the programmer, but about those HSA members whose primary expertise is not in the CPU and manufacturing space where someone (probably just the one) is either beating the tar out of them or poised to by engineering and economic might.
Their primary leverage is their extant IP and domain-specific knowledge, and they need something they can wrap it in so that they have any way of providing a value-add in the face of a lack of manufacturing/design improvement, inability to overcome crippling shortfalls in general-purpose capability, and for some the downside that the breakeven point for manufacturing their niche products is rising faster than their niche can provide revenue or volume, much less return on investment for additional R&D.

Preferably by making them edge towards the direction taken by the vendor(s) that so heavily inspired it.

For something like a pure compute "coprocessor" (Xeon Phi or something) I agree. But let's get real here - the main use case is for CPU+GPU and that's highly problematic if you also want your CPU to be running an OS and the GPU to be drawing things on the screen. Unless you're arguing that you should sequester off a portion of the GPU from the OS which is where it gets back to "totally silly" territory.
There's a matter of "should" and a matter of what is out there. How AMD's run list is executed is not OS-exposed.
As much as HSA might be a client compute initiative, it pretends to have broader applicability, and so it serves as a Powerpoint slide for GPUs in the face of Xeon Phi, or just regular Xeons.

I appreciate your points here and I agree that they are likely in-line with the thinking of the folks involved with HSA, but my opinion hasn't really changed here.
Just to clarify, I am trying to express what I speculate is the thinking or motivation for what was done in HSA. I am not necessarily positing it as a net gain, if some of my jaundiced tone is lost in print.
I may be failing to represent HSA as its proponents say it should be presented as a result.

The largest gap I see in our positions has more to do with the OS, which is not so much about HSA as where I think things could be going.
For certain system elements, full involvement by the OS is too slow or high-energy, which is where DVFS and power state sleight of hand fits in.
In other scenarios, it is a question of what the OS may be, when more than one is in play.

My fear for of the future has to do with where platform security comes in.
A monolithic OS is a massive attack vector against two targets (or at least two).
One is the integrity of a system, where the complexity of a modern massively featured OS present a massive space for exploit.
The other is the ability to control what people can do with their system in ways that cannot be properly monetized.
An OS is vulnerable to attack, and perhaps more importantly to the vertically integrated device/media consumption providers, it is something that can be changed, replaced, or jailbroken.
I think at some point the secure domain, or the citadel of DRM keys and prefabbed devices will take care of parts of the system because it will not trust the outer OS or container to have full control or awareness of some of these elements, but in the interest of minimizing attack cross section and general lack of interest, that secure hypervisor or trusted environment will not care to provide the interoperability facilities that make up part of HSA either.

I think there are elements that might be interesting or useful in certain situations or directions HSA 1.0 does not take, which may be something I can try to write about in this thread.
 
Sorry for the double post, but the prior one got a little dark at the end. I do not think that the dystopian outcome is the only thing that can come of a more fluid boundary between OS management and lower-level sleight of hand, but I think that those pose a near-term impetus to challenge the absolute ownership of the system by a traditional client OS.
 
Sorry for my lack of participation, I hurt my knee and have been AFK for a couple of days... hopefully I will be able to give this discussion more time this week, pending my Ortho Dr's appointment.
 
Sorry for my lack of participation, I hurt my knee and have been AFK for a couple of days... hopefully I will be able to give this discussion more time this week, pending my Ortho Dr's appointment.
Ouch! Definitely take all the time you need to recover; there is nothing time critical or pressing at all about this discussion :)
 
Hi people!

That could be an interesting thread starting with a comprehensive list of applications utilizing HSA
 
Back
Top