That's the real point though - a lot of the hardware requirements stuff here interacts intractably with the OS. Sure you can play some games in software and drivers, but ultimately the operating system owns your cores, memory and scheduling.
I will have to nibble around the periphery of this topic until I can collect my thoughts on it. It's a quite wide range of concerns and objectives.
Some questions arise at this point.
What is meant by the OS?
Which OS?
The Windows 7 installation on a PC. The first, second, or third OS on the Xbox One?
The secure domain running on some chips, the TPM-enabled system monitor's storage, the host VM, a low-level firmware for a Thing on the Internet Of Things (*wince*)?
Given the range of vendors HSA attempts to entice, there is a lot of variation on just what these operating systems tolerate, but which need to be abstracted away. How many of these operating systems provide mechanisms for interoperability, or have a desire to even attempt to provide it?
I thought HSA served as a virtual execution context, a virtual ISA and basic execution space that would allow a programmer a way to tractably reason through an implementation without knowing a lot of system particulars, some of which many operating systems would not care to deal with.
What those things are is partly along the lines of my questions to the claim of ownership over cores, memory, and scheduling.
Does the OS own the DRAM buffer on my SSD?
To what extent did it own the frame buffer of older video cards when the aperture was elsewhere?
In the case of Transmeta and Nvidia's code-morphing CPUs, what of their translation caches in DRAM?
The scratchpads of innumerable licensed DSP cores, VLIW doodads, offload engines, system processors, etc.
There are a number of special cases I can think of to core ownership, or the specifics of it. (Besides virtualization, containers, or other cases where we need to ask which OS is the OS.)
Cell SPEs could be dropped early in the boot process into a secure mode that almost nothing could interact with. There are cases of mobile devices with shadow cores that step in at times of low activity. Perhaps all of them use a software driver, but it comes to power management, idling and gating cores, hardware or an on-chip firmware engine can make a lot of calls without trekking over to the OS for anything beyond general policy (which some can ignore).
Multiple x86 SOCs now have power states that do more than the OS is lead to believe, although whether that is a loss of ownership or a just a longer leash is a point of debate.
As far as scheduling goes, in light of the various secure domains, low-level sleight of hand, and existing shader engines:
Perhaps it should be owned by the OS, whichever one that is, but we have examples where that is at least partly not the case.
Separate from HSA, looking at the interest in securing platforms, some of these parties may not trust the OS enough to give it the same visibility or ownership over resources as others. Power management has nibbled away at the periphery as well.
My weak understanding of HSA is that it tries to appeal to platforms and vendors that can differ on a lot of this, as plausible a goal as that may or may not be.
An HSA scenario can have a program whose execution spans an x86 core, an embedded processor with a DMA-managed scratchpad, an ARM processor, a GPU shader running its own run list, a VLIW core with a legacy in low-level embedded.
They do not agree on what order their writes become visible, they do not have the same ISA, they do not have the same virtual memory handling, the same coherence protocols, signaling methods, interrupt methods, OS visibility, varying numbers of memory spaces, etc. Absent some of the aforementioned HSA requirements, they wouldn't have the ability to agree on what time something happened.
However, this program wants to run with much of this abstracted as if it were a regular user space with well-behaved memory, without invoking privileged operations or relying on OS facilities that might not be there in the form expected, because we don't know which OS it is and we may not know which x86, embedded core, random processor it is.
A lot of the quirks like the very relaxed consistency model and various atomics and timers may point to how widely varying in quality and sophistication the players may be.
Why? Perhaps those quirky cores have something useful that would be a bit more useful if they could readily be peers for certain parts of the program.
Or some of those vendors have kind of crummy IP, or IP they want to leverage, and they don't want to validate that part of the protocol over every OS and host vendor. Or they don't want to give up certain conventions or use a specific OS.
Or they don't want to become dependent on a specific x86 vendor.
Or they are a specific x86 vendor and they don't know how long they can keep being an x86 vendor.
Or, drugs, maybe.
For instance, it doesn't matter whether or not HSA says you need to be able to preempt, you can't be a driver on most sensible operating systems without that.
What interests me is that this may be a rather late addition, perhaps in part because AMD realized its attempted HSA standard bearer in Kaveri could not manage it.
Now in the defense of HSA 1.0, it does have some stricter requirements than current OSes - which is fine - but it doesn't really change my criticism: this is all stuff that is really up to the operating system and will eventually be required there as well. You can't do an end run around the OS on any of this - it all has to live on top of the operating system in the end, which makes the concept of a "system architecture" sort of suspect. There's some nods to this in the spec related to "platforms that implement HSA profile XYZ".
Getting low-level behaviors on a homogenous host system consistent can be challenging enough. (Going by Linus Torvalds rants on debugging weakly-ordered systems on RWT as an example.)
Stipulating for now that it is worth the effort to provide a platform upon which processors that cannot agree under normal circumstances on how to spinlock can accomplish work:
Should it really be on the OS vendors to figure out this mess across a universe of implementations?
What if they agreed on a common method, hardware and software vendors.
Like, what if the OSes agreed on providing a userspace queueing facility?
Then they'd probably need to agree on a lingua franca since they have so many ISAs.
But their atomics, LL/SC, CAS, methods don't agree.
So really while HSA hardware requirements are all well and good, I don't see how they add anything to the ecosystem. We were all already moving towards this stuff before HSA ever came along and while assembling a list outside of the context of a given platform has some value, it's hardly revolutionary.
Some of that may depend on who We is.
Progress is not uniform, and a lot of players are aware of the We they stand to lose to.
Much of this is more about what I think the motivations may be.
I do agree that the edifice being built may have inherited too much from the obvious APU basis that the biggest pusher used as its inspiration ofand may be too overly accommodating due to the widely varying quality of implementations.
Its premise is to make things that traditionally could not be peers to CPUs become equals of a sort, at least in user space.
This does assert that many of these items deserve even some of the legitimacy of CPU host platforms, which I will just leave as an item of debate.
Perhaps I can try to mull over some of the architectural points, and where it might not appeal to my whimsy. Too sane, perhaps, for what it aims to do?
Maybe in a later post to save length here.