DeanoC blog PS3 memory management

Vysez said:
I fail to understand your logic.
NUMA means that the GPU can address the Main Memory pool, just as the CPU can address the VRAM. In other words, you don't "have to" stock all your data anywhere... obviously it's more practical to do some things logically, but you're not forced to that.
Also, if your scene is composed of 412MB of texture datas, I think you have to fire your Lead Programmer, now.

Only the Xbox uses an UMA.

I really can't understand why you don't get my simple two example...

NUMA means that gpu can't address directly the system memory and the cpu can't address directly gpu local memory, this is a simple concept, but a very bad thing or a developer

also I don't understand why 400+ MB of texture data is bad, have you an idea of how much color text+ normal text will be required in the next years?

doom3 in ultra-mode, uses 480+ MB of normal+color texts, and this is an old game
how much data will require a finished game, and not a tech demo, based on U3Engine in the next 2-3 years?
do you think that 256 MB solution (old pc cards and ps3) will be ok, when the scene detail will grow so much?

and with UMA is easy make procedural textures and geometry, because the cores can work on gpu data without moving a lot of this from apoint to another of the machine (doubling the memory reserved for this data, and adding a lot of passages)
 
Shifty Geezer said:
I hear there's a nice Irish pub in London. You wouldn't per chance be considering a visit there in the not too distant future?
yeaahh, I hear its gonna be packed with people sometime this month I believe;)
 
Vysez said:
Only the Xbox uses an UMA.

this is a ps2
mallinson_02.gif


figure_11a.jpg



;)
 
The PS3's memory problem is more like this AFAIK:

The framebuffer can't be split between different memory banks, so it all has to be in the VRAM. But you want to maximize framebuffer bandwith, so you'd like to put (most of) your textures into the main RAM so that accessing them won't cut into the already limited VRAM bandwith.

Consequences:
- you'll have 100-200MB of VRAM to use for something that wouldn't consume too much bandwith
- you'll have to send textures to the GPU through Cell / via the FlexIO bus (could this be why Nvidia needed a lot of time to rearchitect the RSX, so that the texture cache can work with this method?)
- you'll have to pay attention to and organize and measure a lot more data traffic inside the system, which may take valuable development time from other tasks

PS3 devs, feel free to correct my theories ;)
 
Griffith said:
NUMA means that gpu can't address directly the system memory and the cpu can't address directly gpu local memory, this is a simple concept, but a very bad thing or a developer
Who said the CPU and the GPU, in the PS3 case, couldn't adress the two Memory pools "directly"? And what do you mean by "Directly"?, in the first place.
Griffith said:
also I don't understand why 400+ MB of texture data is bad
If you're working with Less than 512MB of RAM (due to the OS footprint), and if you have allocate 400+MB for the textures only. I repeat, you need to fire the lead programmer right away.
Griffith said:
doom3 in ultra-mode, uses 480+ MB of normal+color texts, and this is an old game
480MB of uncompressed Normals maps. On a PC with, for the Ultra Setting, More than 1GB or RAM and a 512MB Graphical card. And Doom 3 uses no memory saving tricks whatsover. It just loaded the level, initialized the datas and the engine, and ran.
 
Griffith said:
NUMA means that gpu can't address directly the system memory and the cpu can't address directly gpu local memory, this is a simple concept, but a very bad thing or a developer

I thought Non-Uniform Memory Access computers just have another "shared" memory area with higher latency than local memory. It does not mean that a processor element can't access the shared memory directly ? I am waiting for Sony to release more info about how RSX interacts with Cell.

PS3 and Xbox 360 has different design philosophy. It seems to me that the latter focuses on savings and resource optimization while the former focuses on speed optimization and raw power. Both are flexible in their own ways. *If the conditions are condusive* we should see significant speed up in PS3 just because it's engineered to be so.

It's hard to talk about just 1 element and say this is an overriding weakness. The whole concept works together to deliver the end result.
 
Laa-Yosh said:
The PS3's memory problem is more like this AFAIK:

The framebuffer can't be split between different memory banks, so it all has to be in the VRAM. But you want to maximize framebuffer bandwith, so you'd like to put (most of) your textures into the main RAM so that accessing them won't cut into the already limited VRAM bandwith.

Consequences:
- you'll have 100-200MB of VRAM to use for something that wouldn't consume too much bandwith
- you'll have to send textures to the GPU through Cell / via the FlexIO bus (could this be why Nvidia needed a lot of time to rearchitect the RSX, so that the texture cache can work with this method?)
- you'll have to pay attention to and organize and measure a lot more data traffic inside the system, which may take valuable development time from other tasks

PS3 devs, feel free to correct my theories ;)


your questionns and conlusions come from a pc's card not real rsx...
 
Vysez said:
4M E-DRAM

hey, this is a joke, it isn't?

with your logic, the 360 uses NUMA because it has edram
:D

no, the ps2 stores GS data anc CPU data in the same memory, edram is used for framebuffer and for little particles effects

I know you are joking saying that edram means that GS and VU can't access directly the main sys mem for textures and geometrical data (this is the concept of 'unified memory')

using more than 256 MB for textures and data or to have a lot less passages in some cases are not the only advantages of UMA

there you can find some useful read:

http://arstechnica.com/articles/paedia/cpu/xbox360-1.ars/2
http://arstechnica.com/articles/paedia/cpu/xbox360-1.ars/3

There are two options for the vertex data output of these threads, one of which is that they can be moved into main memory for later use by the GPU. This would happen in situations where the GPU doesn't need the data immediately and would like to stream it from memory at a later time
http://arstechnica.com/articles/paedia/cpu/xbox360-1.ars/4


and read what Dave wrote ;)

**********
In addition to its other capabilities Xenos has a special instruction which is presently unique to this graphics processor and may not necessarily even be available in WGF2.0 and this is the MEMEXPORT function. In simple terms the MEMEXPORT function is a method by which Xenos can push and pull vectorised data directly to and from system RAM. This becomes very useful with vertex shader programs as with the capabilities to scatter and gather to and from system RAM the graphics processor suddenly becomes a very wide processor for general purpose floating point operations. For instance, if a shader operation could be run with the results passed out to memory and then another shader can be performed on the output of the first shader with the first shader's results becoming the input to the subsequent shader.

MEMEXPORT expands the graphics pipeline further forward and in a general purpose and programmable way. For instance, one example of its operation could be to tessellate an object as well as to skin it by applying a shader to a vertex buffer, writing the results to memory as another vertex buffer, then using that buffer run a tessellation render, then run another vertex shader on that for skinning. MEMEXPORT could potentially be used to provide input to the tessellation unit itself by running a shader that calculates the tessellation factor by transforming the edges to screen space and then calculates the tessellation factor on each of the edges dependant on its screen space and feeds those results into the tessellation unit, resulting in a dynamic, screen space based tessellation routine. Other examples for its use could be to provide image based operations such as compositing, animating particles, or even operations that can alternate between the CPU and graphics processor.

With the capability to fetch from anywhere in memory, perform arbitrary ALU operations and write the results back to memory, in conjunction with the raw floating point performance of the large shader ALU array, the MEMEXPORT facility does have the capability to achieve a wide range of fairly complex and general purpose operations; basically any operation that can be mapped to a wide SIMD array can be fairly efficiently achieved and in comparison to previous graphics pipelines it is achieved in fewer cycles and with lower latencies. For instance, this is probably the first time that general purpose physics calculation would be achievable, with a reasonable degree of success, on a graphics processor and is a big step towards the graphics processor becoming much more like a vector co-processor to the CPU.

Seeing as MEMEXPORT operates over the unified shader array the capability is also available to pixel shader programs, however the data would be represented without colour or Z information which is likely to limit its usefulness.

ATI indicate that MEMEXPORT functions can still operate in parallel with both vertex fetch and filtered texture operations

Dave
********
http://www.beyond3d.com/articles/xenos/index.php?p=10

UMA is far better than a pc-centric architecture with 256 GDDR local gpu and a separate 256 not GDDR (add there latencies problems) cpu memory
 
Griffith said:
UMA is far better than a pc-centric architecture with 256 GDDR local gpu and a separate 256 not GDDR (add there latencies problems) cpu memory
A: the 360 isn't UMA; in order to qualify for that you need ONE memory pool, and it has two separate. In comparison, PS2 and gamecube both have four.

B: UMA is unsuitable for high-performance computer systems because lots of accesses to the same memory pool increases randomness, and hence page-break penalties and other overhead. DRAM works most efficiently when running in long bursts, and UMA access patterns diverge from that the more stuff you have going on in the system. Xbox performance suffered from the unpredictable nature of GPU memory accesses in its UMA system architecture.

Besides, your claim cell can't directly access GPU RAM and other way around is essentially bull. There's no discernible difference between cell accessing GPU RAM for example compared to a pentium 4 accessing its main RAM; both go through an external interface to a DRAM controller located on a separate IC. Your claim this setup is 'very bad for developers' is completely unfounded and essentially grabbed out of thin air. You might as well argue it's very good for developers, because now you have actually doubled aggregate system bandwidth compared to a UMA design!
 
This is getting tiresome...

Griffith said:
hey, this is a joke, it isn't?

with your logic, the 360 uses NUMA because it has edram
:D
Nope, I'm not joking.

Can the GS address the Main Memory? No. All the data needed to render a (part of the) frame has to be loaded in the eDRAM. Therefore this is not a UMA system.

Can Xenos address the Main Memory? Yes. Is there a point to use 10MB of eDRAM, that contains the ROPS, in a NUMA fashion... Honestly that's a thing nobody would have think of...

BTW, I don't see what you're trying to prove, nor what you're discussing in here, Griffith, especially with the links to those articles, I'm starting to think that you're either kidding me, or you didn't read this board for long enough.

edit: Thanks for the more comprehensive answer, Guden.
 
Guden Oden said:
A: the 360 isn't UMA; in order to qualify for that you need ONE memory pool, and it has two separate. In comparison, PS2 and gamecube both have four.

B: UMA is unsuitable for high-performance computer systems because lots of accesses to the same memory pool increases randomness, and hence page-break penalties and other overhead. DRAM works most efficiently when running in long bursts, and UMA access patterns diverge from that the more stuff you have going on in the system. Xbox performance suffered from the unpredictable nature of GPU memory accesses in its UMA system architecture.

Besides, your claim cell can't directly access GPU RAM and other way around is essentially bull. There's no discernible difference between cell accessing GPU RAM for example compared to a pentium 4 accessing its main RAM; both go through an external interface to a DRAM controller located on a separate IC. Your claim this setup is 'very bad for developers' is completely unfounded and essentially grabbed out of thin air. You might as well argue it's very good for developers, because now you have actually doubled aggregate system bandwidth compared to a UMA design!


your point A is a very ineducated one
first, 360 IS UMA
http://www.google.it/search?hl=it&q=360+unified+memory&btnG=Cerca+con+Google&meta=

second, unified memory means that you can address directly and store cpu, graphical, audio data, NOT that the system has only 1 type - 1 kind - 1 pool of memory

cache, registry, hard disk, gddr, edram, memory cards, flash memory are ALL 'memory'
don't exist a system with one 1 block of memory, this is NOT the meaning of 'unfied' (roll eyes)

another ineducated claim is the "aggregate bandwidth" thing
storing and using system memory via cell-flexio is inefficient compared to a direct acces as the local 256 MB GDDR3 is, add to this lower efficiency the well Known big latencies problem of RAMBUS and explain me the meaning what kind of performance advantage can you get

and again, the frame buffer MUST to be in one memory, you can't, can't use the flexio to speedup none of the FB operations, I talk of bandwidth killer ops as like HDR, AA, Z, and filling FB

no, those ops will cut in two the bandwidth of GDDR3 local mem (I remember that the bus is still a 128 bit one, this add a problem to the problems)
 
Vysez said:
Nope, I'm not joking.

Can the GS address the Main Memory? No.

YES IT can

mallinson_02.gif


All the data needed to render a (part of the) frame has to be loaded in the eDRAM.

wrong another time!!
"all the data needed" eh?
and where you think that the textures is stored?
you are skipping the point, answer this please :D
Therefore this is not a UMA system.

wrong starts, wrong conclusion ;)

BTW, I don't see what you're trying to prove, nor what you're discussing in here, Griffith, especially with the links to those articles, I'm starting to think that you're either kidding me, or you didn't read this board for long enough.

I think that if you don't see what I prove, you really don't want to see
the link talks about the advantage of 360 UMA, if only you click on those, it will be evident, seems that you skip all just to keep your position
I know that you are the moderator, but this don't means that you can say to a person that "you didn't read this board for long enought" especially after you say:

360 is uma, but ps2 not, because ps2 store all needed data to render a frame in 4 MB eDRAM (FALSE!)

first 360 too have eDRAM, so you don't agree with yourself, second you talk about my person but yet you don't explain how ps2 put all the needed data in 4 MB, really the ps2 games uses less than 2 MB for textures? (I assume that 2 MB are reserved for FB, right?)
 
This is getting rediculous...

Griffith said:
YES IT can


wrong another time!!
"all the data needed" eh?
and where you think that the textures is stored?
you are skipping the point, answer this please :D


wrong starts, wrong conclusion ;)
So, you're telling me that the GS can adresss the Main Memory?
You're also saying that the GS can render a frame without any data stored on the eDRAM, right?
Only the framebuffers and the "particles things" are stored on the eDRAM, right?

You know what, you should tell that to the guys, on this forum, who worked, or still works, on the PS2, they will all slap their foreheads and say "This was so obvious from the beginning, why didn't we see it first!"
Griffith said:
360 is uma, but ps2 not, because ps2 store all needed data to render a frame in 4 MB eDRAM (FALSE!)

first 360 too have eDRAM, so you don't agree with yourself,
I said that the UMA wasn't the preferred choice if you had the choice between a classical architecture, a UMA or a NUMA, and then I added that in the X360 case it wasn't an issue since it had an eDRAM Memory pool. If it has more than one Memory pool, it's not UMA. QED.
Therefore I fail to see any form of contradictions, in my claims.


Griffith said:
yet you don't explain how ps2 put all the needed data in 4 MB, really the ps2 games uses less than 2 MB for textures? (I assume that 2 MB are reserved for FB, right?)
Yes, and the GC has 1MB of eDRAM for its Texture cache. Welcome to the real world Griffith.
I could indeed try to explain how a standard rendering pipeline would work on the PS2 or the GC, but honestly, your behavior in this thread proves that it doesn't worth the time.

By the way, do not discuss subject you do not master, or even understand, Griffith.
And, if you absolutely want to discuss those things, try tnot to call people who correct you, liars, because it's not extremely tactful, to say the least.
 
Griffith said:
second, unified memory means that you can address directly and store cpu, graphical, audio data, NOT that the system has only 1 type - 1 kind - 1 pool of memory

cache, registry, hard disk, gddr, edram, memory cards, flash memory are ALL 'memory'
don't exist a system with one 1 block of memory, this is NOT the meaning of 'unfied' (roll eyes)

The original definition of NUMA is something like this (http://en.wikipedia.org/wiki/Non-uniform_memory_access). PS3 is a NUMA because of *asymmetric* access to the SPU Local Store and the shared "main" memory (XDR and GDDR). Access to the shared memory *can* be direct or indirect e.g., via message passing. Strictly speaking RSX interaction is an unknown, but it should follow the same access characteristics.

*Uniform* Memory Access can span over a few pools but *require* that each processor has uniform access to the memory. There is no concept of local memory. Cache, hard-disk, registers, etc. do not count in the definition of NUMA vs UMA. Xenon has 3 cores, each has symmetric and uniform access to the main (shared) memory. Xenos is NUMA if it has local memory (not cache !) to operate on plus access to the shared system memory.

In both cases, you can store "cpu, graphical, audio data" in the shared memory. The only difference is in PS3, you have 2 pools of shared memory.

NUMA is invented as a way to _optimize_ SMP (UMA) arrangement to:
* Control data locality
* Allow higher speed local memory

You may be able to argue better once you acknowledge NUMA's advantages. The split of GDDR and XDR memory pools may not be relevant to NUMA vs UMA discussion.

EDIT:
Just to clarify my last statement, Xbox 360 also has *Unified* memory. I supposed that means combining the memory for GPU and CPU together... as opposed to PS3's "Split" main memory pools. You can argue either way which one is superior (depending on size of available memory vs data, frequency of use, ...). I'm leaving out RSX details since I don't know what it is.
 
Last edited by a moderator:
I have no idea how they teach to write the code.. However, the program I saw is extremely competitive.. One of the instructors is the lead programmer of the original Far Cry .. So, at least they know what they are talking about. However, they were also complaining about not having access to console development kits. Anyway, here is the link for the program. They are part of SMU in Dallas, TX.
Ah... Guildhall... Yeah, I've heard they're pretty good. I haven't seen any programmers who came out of there, but half the artists at the company I work came out of Guildhall, so that at least says there's one place that has the idea. I just get disgusted when I see the commercials on G4 -- e.g. the truly evil one with the guys playing something and then saying they're almost done "tightening up the graphics on level 3"... yyyyeah.

Between the trade schools and my experience in university, I just tended to see far extremes. The trade schools (possibly with some exceptions, but I've yet to experience that firsthand) tended to be memorizing how with no why. Show some fundamental examples and what they do, but never get into the motivation and the logic and the theory behind it. University CS programs for me were always broader pure theory and your whole life was defined in terms of hypothetical machine constructs with infinite storage, and it was very rare that you ever got to sit in front of a machine writing any kind of code. I was surrounded by classmates who cried in pain at the minuscule blocks of assembler they had to write in the architecture courses. Numerical analysis? HA! When I took it, I was the only undergrad in the course.
 
ralexand said:
Why is this any different than streaming from DVD. Seems like the question isn't how much disc storage you have but how much data you can get into ~450 MB of ram at 1/30th of a sec. Seems like that would determine graphic fidelity of the game. Seems like blu-ray would only affect graphic quality if you have a game with a single level and that single level has to fit into more than 9 gb of space.
7GB;) , no?
 
I thought the observation that 300 MB is more than either pool in the PS3 was a pretty interesting one. Apparently, the RSX must be able to handle at least reading from both pools at real time rates and possible both reading and writing to both pools at real time rates. That's certainly good for development, I would think.
 
Back
Top