Digital Foundry Article Technical Discussion Archive [2013]

Status
Not open for further replies.
I didnt see it that way. They just saying that if you're accessing random data (for shader) on the gpu, it going to be slower because it's not cached. I think it's a duh moment...Gpu is still a gpu, you have to hide the latency by arranging the data according.

The CPU on the other hand has cache that allows the dev to run without doing all the mundane work...lazy bastards...JK.
 
The PS4 operates a system where memory is allocated either to the CPU or GPU, using two separate memory buses.

"One's called the Onion, one's called the Garlic bus. Onion is mapped through the CPU caches... This allows the CPU to have good access to memory," explains Jenner.

"Garlic bypasses the CPU caches and has very high bandwidth suitable for graphics programming, which goes straight to the GPU. It's important to think about how you're allocating your memory based on what you're going to put in there."

Optimising the PS4 version of The Crew once the team did manage to get the code compiling required some serious work in deciding what data would be the best fit for each area of memory.

"The first performance problem we had was not allocating memory correctly... So the Onion bus is very good for system stuff and can be accessed by the CPU. The Garlic is very good for rendering resources and can get a lot of data into the GPU," Jenner reveals.

"One issue we had was that we had some of our shaders allocated in Garlic but the constant writing code actually had to read something from the shaders to understand what it was meant to be writing - and because that was in Garlic memory, that was a very slow read because it's not going through the CPU caches. That was one issue we had to sort out early on, making sure that everything is split into the correct memory regions otherwise that can really slow you down."
I'm only able to interpret all that one way - where you 'put stuff' in RAM (dunno if it's related to address space who is accessing what when) needs to be managed, which sound not a great deal different from whether you choose t put assets in VRAM of DDR3.
 
From what I read it just means you have a choice to give information to the GPU from CPU in three different ways, where the main difference is how you use the cache. Yes, it sounds similar to split memory, but optimising cache use would be a factor regardless I assume, so it's still lowering complexity?

And you can split it into whatever size you want.
 
I believe that the extra bus for the CPU is the first of the 3 major modifications that Cerny mentioned in the Gamasutra interview.
 
It' no different than any x86 processor where you have to for example tag memory pages for write combining, vs cached etc.
On of the things that the page table in x86 processors has never supported is mapping the same physical page to two different virtual locations with different attributes.
This was commonly done on none X86 based consoles. For example using the top bit of the address to signify cached. The application has to ensure cache coherency in this case, but it's not usually an issue.
 
So what are the real advantages of unified memory if the processors still have to be treated as if accessing two different memory pools? I had always thought they'd have arbitrary random access to all game memory locations.
 
Do the devs decide where the 'pools' are then? I suppose a bus would be assigned to cover memory addresses in one half, and the other bus deals with memory in the other?
 
Do the devs decide where the 'pools' are then? I suppose a bus would be assigned to cover memory addresses in one half, and the other bus deals with memory in the other?

Honestly I haven't looked, but I would assume it's per page and that when you create a mapping between a physical address and a virtual one, you tag the page with the appropriate attribute.
 
So what are the real advantages of unified memory if the processors still have to be treated as if accessing two different memory pools? I had always thought they'd have arbitrary random access to all game memory locations.

Not that i have much concrete facts as it stands, but i always thought that Unified Memory was in essence the erasing of the inherent split between VRAM and System memory(not to be confused with the small OS memory footprint in consoles) allocation in PC's.


As opposed to what your suggesting with the CPU and GPU managing the same pool of memory which is UMA as we're discussing for these APU's.

Is that incorrect?
 
So what are the real advantages of unified memory if the processors still have to be treated as if accessing two different memory pools? I had always thought they'd have arbitrary random access to all game memory locations.

It's not a question of access, it's a question of behavior.

The problem with two pools has always been that at least one of the pools was inconveniently sized, not that there were two pools. Even on Xbox 360 things are commonly split into write-combined and cached areas of memory based on how they will be used.
 
It' no different than any x86 processor where you have to for example tag memory pages for write combining, vs cached etc.
On of the things that the page table in x86 processors has never supported is mapping the same physical page to two different virtual locations with different attributes.
This was commonly done on none X86 based consoles. For example using the top bit of the address to signify cached. The application has to ensure cache coherency in this case, but it's not usually an issue.

In the Durango technical thread I posted an HSA roadmap from the Bonaire Anandtech review and there were a few items listed as "2013" features:

1. Unified address space for CPU and GPU
2. Fully Coherent memory between CPU and & GPU
3. GPU uses pageable system memory via CPU pointers

Are any of these related to the topic here and how the memory pools are split?
 
Now this is all based on my own take on the technology, so may likely be wrong.

The memory is definitely not split by fixed ratios. The feature requested by Sony was to allow memory to be tagged as CPU non-cached (bypasses the CPU caches). This is a behavioral matter that even mediocre developers should easily be able to grasp. This all depends on how the developers tag the memory. It's simply a matter of them profiling the algorithms implemented to determine if they'd benefit from bypassing the CPU caches.

It's such a non-issue in the overall scheme of things. It's nowhere near the issue of dealing with split-pools.
 
The memory is definitely not split by fixed ratios. The feature requested by Sony was to allow memory to be tagged as CPU non-cached (bypasses the CPU caches).
That makes a lot more sense. Rereading, he mentions allocating memory, so it's a case of deciding themselves what goes through what bus. If that's arbitrary pages of RAM and not a fixed address split, I understand how its different to split RAM (a case of where the dev wants to put the data, and not where they have to put the data).
 
So it's unified memory but on two separate buses where you have to place the data in different memory locations depending on where you access it from? Sounds like it's as much faff as split RAM then!
My understanding of it is that Sony should develop an API+driver that leverages the benefits of those special links by it-self.

Though it is not exactly the topic at this point I think it is more a bother than anything else to let devs with low access to the hardware, be it on durango or Orbis. If I read a couples posts by A.Lauritzen on the matter right, I'm inclined to believe that it creates more problems that it solves.

I hope at least MSFT moves from that approach and has a proper roadmap for its hardware instead of a limited serie of "dumb" shrinks.

Edit I'm aware of perfs issue but imo when both manufacturers are throwing 25% the CPU power at tasks that are imo not priority for anyone, We don't know for Sony but MSFT throw away more than 33% of it RAM for the same reasons mentioned above, I'm not sure to which extend that is relevant /noise.
 
Last edited by a moderator:
Assassin's Creed 4 Tech Analysis.

Pretty much what I was expecting... better on things they can do easily for next-gen (higher resolution, textures, particles, shaders), but still using all of the same assets from current-gen. So not the "next-gen" AC that we were hoping for. Sounds like it's more or less on par with AC3's PC version, but capped at 30fps (but at least it's a steady 30fps).
 
Assassin's Creed 4 Tech Analysis.

Pretty much what I was expecting... better on things they can do easily for next-gen (higher resolution, textures, particles, shaders), but still using all of the same assets from current-gen. So not the "next-gen" AC that we were hoping for. Sounds like it's more or less on par with AC3's PC version, but capped at 30fps (but at least it's a steady 30fps).

Yeah, we're going to have to wait for games that aren't released on current console to show off what next-gen consoles can really do. Few launch console games ever feel like they were specifically designed for the system instead of just being ports or purgatory projects that suddenly see the light of day.
 
Agreed. 'tis always a tricky time, at cross-gen.

I fully understand why they do it, though, it's simple market saturation. While it's a sure bet that anyone with a next-gen console will buy the next-gen version, the market penetration is simply too low to be cost effective for a big AAA title. Makes me sad, though. I am definitely paying attention to game specifically created around the new hardware, though.
 
If only there was some means of cross-generation gaming. If AC4 is anything like AC:B, or AC:R then I will have to base which version I purchase based on what version my friends get if I ever want to play multiplayer. It's a bit of a shame when I think the real experience of the game is strictly the single player aspect.

I was hoping DF article would maybe have some details on that.
 
Status
Not open for further replies.
Back
Top