I have a few Questons about PS3/XBOX360 RAM

ShootMyMonkey · Jul 8, 2005

The main advantage that PS3's NUMA has is the fact that the main memory that the CPU will be using is lower latency (XDR vs. GDDR3) and closer to the CPU, which is something that CPUs certainly care about, even though GPUs may not. As opposed to the UMA of 360 where there is definitely potential for the GPU and CPU to be contending for the same bus. A single cache miss on 360 can easily cost you 500 cycles. Don't really know yet about PS3.

Of course, having a single memory pool has its advantages. Mainly in the fact that it's a lot easier and the fact that everything actually is *physically* in the same memory pool. While you could say that PS3 has a single memory pool in *virtual* memory space (which relates back to the GPU accessing main memory and the CPU accessing VRAM and the SPEs accessing each others LSes), that's not really the same thing in practice.

Last I heard it was 64MB of eDram on GPU.

That sounds a lot like what I said on another forum -- I said that if you were going to use an eDRAM framebuffer for RSX, you'd need at least 64 MB assuming the goal of dual 1080p at 128-bit HDR -- a little different from saying there IS 64 MB of eDRAM.

j^aws · Jul 8, 2005

Vysez said:
scooby_dooby said:

I'm also interested in the benefits of unified memory vs non unified.

Click to expand...

It's NUMA VS. UMA (.pdf), to be exact.

Interestingly, the NUMA 'link' makes a distinction of 'NUMA' and 'ccNUMA". In the UMA link, they discuss some of the disadvantages while seemingly describing it for a PCI PC from a few years ago but these inefficiencies are what KK described from being removed for the PS3,

KK said:
For example, RSX is not a variant of nVIDIA's PC chip. CELL and RSX have close relationship and both can access the main memory and the VRAM transparently. CELL can access the VRAM just like the main memory, and RSX can use the main memory as a frame buffer. They are just separated for the main usage, and do not really have distinction.

This architecture was designed to kill wasteful data copy and calculation between CELL and RSX. RSX can directly refer to a result simulated by CELL and CELL can directly refer to a shape of a thing RSX added shading to (note: CELL and RSX have independent bidirectional bandwidths so there is no contention). It's impossible for shared memory no matter how beautiful rendering and complicated shading shared memory can do.

http://www.beyond3d.com/forum/viewtopic.php?p=528125#528125

Which means that strictly speaking a PC would be 'NUMA' and the PS3 would be 'ccNUMA'?

EDIT:

An earlier discussion on this,

http://www.beyond3d.com/forum/viewtopic.php?t=24483

Jawed · Jul 8, 2005

I suspect RSX's ability to access XDR RAM is no different than G70's ability to access a PC's DDR RAM.

On a PC it's called TurboCache. It is actually a function of the PCI Express architecture.

So I think it's reasonable to expect that the FlexIO interface on RSX provides an analogous "TurboCache" function for RSX in PS3.

Jawed

Npl · Jul 8, 2005

Jaws said:
Which means that strictly speaking a PC would be 'NUMA' and the PS3 would be 'ccNUMA'?

A PC isnt NUMA (speaking of CPU and GFX-Card) - there are 2 seperate pools of memory, so its non-unified. (NUMA = non uniform memory access).
NUMA always refers to a single Pool of Memory AFAIK.

j^aws · Jul 8, 2005

Jawed said:
I suspect RSX's ability to access XDR RAM is no different than G70's ability to access a PC's DDR RAM.

On a PC it's called TurboCache. It is actually a function of the PCI Express architecture.

So I think it's reasonable to expect that the FlexIO interface on RSX provides an analogous "TurboCache" function for RSX in PS3.

Jawed

Yes I'm familar with this. I'd take this further and say, the CELLs PPE L2 cache would be cache coherent with RSX TurboCache and with no data duplication. But can an x86 CPU access the VRAM on a PC TurboCache GPU?

EDIT:

Npl said:
Jaws said:

Which means that strictly speaking a PC would be 'NUMA' and the PS3 would be 'ccNUMA'?

Click to expand...

A PC isnt NUMA (speaking of CPU and GFX-Card) - there are 2 seperate pools of memory, so its non-unified. (NUMA = non uniform memory access).
NUMA always refers to a single Pool of Memory AFAIK.

Yeah that sounds right to me. It's the TurboCache GPU and a x86 CPU that's blurring the issue?

Jawed · Jul 8, 2005

Jaws said:
Yes I'm familar with this.

I was making the point for the general edification of the thread. KK's hype, as usual, needs dispelling.

I'd take this further and say, the CELLs PPE L2 cache would be cache coherent with RSX TurboCache

In the sense that any writes by RSX would cause Cell's PPE L2 cache to be marked as dirty, yes.

and with no data duplication.

Dunno what you're getting at there - a cache duplicates a portion of memory, except when a write to memory is held in cache, waiting to be flushed.

But can an x86 CPU access the VRAM on a PC TurboCache GPU?

Dunno! It also depends on whether you mean directly addressable memory, or memory accessed via a driver.

I expect with Longhorn's virtual memory model for GPUs that GPU VRAM would become part of the PC's main memory space. But I dunno...

Jawed

Npl · Jul 8, 2005

Jaws said:
Yeah that sounds right to me. It's the TurboCache GPU and a x86 CPU that's blurring the issue?

Just looked up the definition and the first that stroke me is that it is defined for Systems of multiple Processors. Dont know how well a GPU fits in there.
NUMA, stricly speaking, says you have local and remote Memory, which have different access-times/bandwith. So if you take the CPU and then consider from the CPU you only have 1 Pool of Memory, then take the TurboCache GPU and you still only have 1 Pool of Memory makes this ... something different than NUMA for sure

Shifty Geezer · Jul 8, 2005

Jawed said:
and with no data duplication.

Click to expand...

Dunno what you're getting at there - a cache duplicates a portion of memory, except when a write to memory is held in cache, waiting to be flushed.

Presumably Jaws is talking of Cell being able to write to a 'cache store' directly accessible from GPU, instead of data being written to CPU cache, exported to RAM, read into GPU cache and used. in essence, direct CPU to GPU communication instead of leaving data at a central repository to be fetched by either processor.

j^aws · Jul 8, 2005

Jawed said:
I'd take this further and say, the CELLs PPE L2 cache would be cache coherent with RSX TurboCache

Click to expand...

In the sense that any writes by RSX would cause Cell's PPE L2 cache to be marked as dirty, yes.

And vice-versa...writes by CELL would mark RSX cache...

Jawed said:
and with no data duplication.

Click to expand...

Dunno what you're getting at there - a cache duplicates a portion of memory, except when a write to memory is held in cache, waiting to be flushed.

Stressing no duplications of data in 'system' memory that includes both XDR and GDDR3 pools.

Jawed said:
But can an x86 CPU access the VRAM on a PC TurboCache GPU?

Click to expand...

Dunno! It also depends on whether you mean directly addressable memory, or memory accessed via a driver.

I expect with Longhorn's virtual memory model for GPUs that GPU VRAM would become part of the PC's main memory space. But I dunno...

Jawed

Yeah, I was referring to directly addressable memory...hmm...

Npl said:
Jaws said:

Yeah that sounds right to me. It's the TurboCache GPU and a x86 CPU that's blurring the issue?

Click to expand...

Just looked up the definition and the first that stroke me is that it is defined for Systems of multiple Processors. Dont know how well a GPU fits in there.
NUMA, stricly speaking, says you have local and remote Memory, which have different access-times/bandwith. So if you take the CPU and then consider from the CPU you only have 1 Pool of Memory, then take the TurboCache GPU and you still only have 1 Pool of Memory makes this ... something different than NUMA for sure

Well that clears it up, not!

So a PC is UMA if you exclude the GPU. If the GPU has a cache controller that also writes to CPU local memeory then it's still UMA but if the CPU cache controller can also write to GPU local memory then it's NUMA, no?!

EDIT:

Shifty Geezer said:
Jawed said:

and with no data duplication.

Click to expand...

Dunno what you're getting at there - a cache duplicates a portion of memory, except when a write to memory is held in cache, waiting to be flushed.

Click to expand...

Presumably Jaws is talking of Cell being able to write to a 'cache store' directly accessible from GPU, instead of data being written to CPU cache, exported to RAM, read into GPU cache and used. in essence, direct CPU to GPU communication instead of leaving data at a central repository to be fetched by either processor.

Yeah...snooping each other caches...

DeanoC · Jul 8, 2005

Jaws said:
Yes I'm familar with this. I'd take this further and say, the CELLs PPE L2 cache would be cache coherent with RSX TurboCache and with no data duplication. But can an x86 CPU access the VRAM on a PC TurboCache GPU?

Most video cards RAM are direct mapped into x86 memory, the way its done is simple to mark those pages as uncacheable to the CPU.

Any other solution involves redesigning the ROPs and GPU memory controllers to be multi-device aware. Which hasn't happened so far, probably with good reason...

j^aws · Jul 9, 2005

DeanoC said:
Jaws said:

Yes I'm familar with this. I'd take this further and say, the CELLs PPE L2 cache would be cache coherent with RSX TurboCache and with no data duplication. But can an x86 CPU access the VRAM on a PC TurboCache GPU?

Click to expand...

Most video cards RAM are direct mapped into x86 memory, the way its done is simple to mark those pages as uncacheable to the CPU.

Any other solution involves redesigning the ROPs and GPU memory controllers to be multi-device aware. Which hasn't happened so far, probably with good reason...

I'm guessing it hasn't happened so far is because until PCI-Express, the PC bus architecture wasn't capable of this. And now that it's capable of this because of 2-way comms, it's still being mapped in this way because commodity memory is cheap for data duplication inefficiencies and PCI-E bandwidths are still relatively low so the need for any other solutions aren't as strong...

I have a few Questons about PS3/XBOX360 RAM

ShootMyMonkey

j^aws

Jawed

Npl

j^aws

Jawed

Npl

Shifty Geezer

uber-Troll!

j^aws

DeanoC

Trust me, I'm a renderer person!

j^aws

Similar threads