What are the true structures of the next-gen APUs? *spawn

That's the case.

It depends on how you interpret Cerny.

Mark Cerny: That is bad leaks and not any sort form of formal evangelisation. The point is the hardware is intentionally not 100 per cent round. It has a little bit more ALU in it than if you were thinking strictly about graphics. As a result of that you have an opportunity, almost like an incentivisation, to use that ALU for GPGPU.

At first I was doing what others were, but the more I read it the more I agree, he is just speaking about shifting use of the total CUs to non-graphics. It sounds like Sony has more plans for GPGPU down road.
 
[This is asking about HSA features, *not* the type of CPU Core:]

So a second question about the structure of the next gen APUs: Are both Kabini-like (not full HSA) or are one or both Kaveri-like HSA or are one or both "full" HSA (fully shared context switching memory)?

I guess the first with the second possibility as maybe and the third as no?



So here is an article which is one reason I am asking for clarification:

http://www.extremetech.com/gaming/1...u-memory-should-appear-in-kaveri-xbox-720-ps4

http://www.extremetech.com/ said:
When will we see it? (And what will it mean?)

When asked, AMD stated that Kaveri (due in the second half of 2013) will be the first chip to use these second-generation HSA features. The G-series embedded parts announced last week, based on Kabini, will not. I’m going to go out on a limb and say I’ll be surprised if this new technology doesn’t show up in the Xbox Durango and PS4, even if the graphics cores in those products are otherwise based on GCN.

Why? Because it makes perfect sense for Microsoft and Sony to adopt this technology. The ability to exchange data and maintain coherency between CPU and GPU is a major benefit in console operations. A recent interview with Mark Cerny at Gamasutra seems to confirm that the PS4 at least will employ AMD’s hUMA tech.

AMD even spoke, at one point, about the idea of using an embedded eDRAM chip as a cache for GPU memory — essentially speaking to the Xbox Durango’s expected memory structure. The following quote comes from AMD’s HSA briefing/seminar:

“Game developers and other 3D rendering programs have wanted to use extremely large textures for a number of years and they’ve had to go through a lot of tricks to pack pieces of textures into smaller textures, or split the textures into smaller textures, because of problems with the legacy memory model… Today, a whole texture has to be locked down in physical memory before the GPU is allowed to touch any part of it. If the GPU is only going to touch a small part of it, you’d like to only bring those pages into physical memory and therefore be able to accommodate other large textures.

With a hUMA approach to 3D rendering, applications will be able to code much more naturally with large textures and yet not run out of physical memory, because only the real working set will be brought into physical memory.”

This is broadly analogous to hardware support for the MegaTexturing technology that John Carmack debuted in Rage.



So I get the feeling there is something more to PS4 and/or Xbox One than just Kabini technology. Does this make sense?
 
Coherent traffic is handled by the Onion bus, with throughput that is on the same order as what the CPU can handle.
If by traditional workload you mean rendering, there's the full-bandwidth non-coherent Garlic bus.

The coherent bus would be reserved for a minority of the traffic, including work that cannot tolerate the full latency of the the GPU memory subsystem and command/synchronization traffic. It may be possible to do the bulk of the compute traffic over the Garlic bus, and then set a complete flag over the Onion bus once done.


The primary coherence endpoint for Jaguar is the L2. The L2 interface in particular is responsible for managing it.
The primary coherence structure for the GPU is the L2. Possibly the interface or memory subsystem that tracks its misses would be what tracks what accesses need to be coherent.

The Jaguar cores and CUs don't snoop each other, it's all L2 misses with the Onion bus and I believe the IOMMU as the intermediaries.
That's many fewer clients than there are cores or CUs.



AMD has promised probe filters and the like for fully HSA-enabled APUs. I'm not sure what would be present for Orbis or Durango.
As far as the GPU caches go, the GPU memory subsystem is capable of handling way more contention than the L2 interface will permit out of the Jaguar modules. The separate coherent and non-coherent bus setup isn't there because the GPU can't handle the traffic.

I see. Thanks.

Would be funny if 14 + 4 ended up in the PS4 doc because AMD rolled one solution for both XB1's and PS4's early beta kits. + 4 referring to the CU count in the final PS4 design.
 
Last edited by a moderator:
It depends on how you interpret Cerny.
If you want to willfully misinterpret the available data and Sony's or also Cerny's statements, yes, then it depends. Otherwise it is pretty clear and the only logical solution that there are 18 uniform CUs. It makes no sense at all to have some 14+4 substructure.
 
Sigh, yes, all the CUs are completely identical!
I have been told that by someone with first hand knowledge of the hardware and the specs.

The 14+4 thing stems from a simple suggestion Sony made to devs at one of its devcons on how they might like to split their rendering and compute workloads, and has been completely blown out of proportion by speculation.
 
Last edited by a moderator:
Back
Top