Sony's Next Generation Portable unveiling - PSP2 in disguise

GZ007 · May 30, 2011

If the game cards are fast enough than 256MB should be enough. At least for NGP-s resolution and textures. Streaming for example with 30MB/s could be enough to make the 512MB only usefull in rare cases.

Npl · May 30, 2011

GZ007 said:
If the game cards are fast enough than 256MB should be enough. At least for NGP-s resolution and textures. Streaming for example with 30MB/s could be enough to make the 512MB only usefull in rare cases.

Yeah should be enough for graphics, now we only need ~1GB for the CPU too =).
Cant see how streaming would help in the case of huge (and possible editable) worlds, and while streaming can help to overcome limitations its still no substitute for RAM, comes at a cost in terms of additional development time and imposes restrictions on the game/levels. Not exactly a good point if cheap-ass Android game on otherwise underpowered smartphones can have a richer experience just to save 20-30€ on a good amount of RAM.

Npl · May 30, 2011

Arun said:
Well, I suppose we should all be thankful to Simon F for developing a good 2-bit texture compression format at least! Also TBDR means MSAA won't take more memory. Probably the most interesting factor to consider is the performance penalty if/when the binning process runs out of space since it takes a *variable* amount of memory.

what happens if the binning process (i suppose thats "storing and sorting triangles" ) runs out of memory?
Does the GPU then render the incomplete scene to free memory before accepting additional commands? That would require an Z-Buffer aswell i suppose...

Ailuros · May 30, 2011

Npl said:
what happens if the binning process (i suppose thats "storing and sorting triangles" ) runs out of memory?
Does the GPU then render the incomplete scene to free memory before accepting additional commands? That would require an Z-Buffer aswell i suppose...

How do you get a bucket to overflow when you always empty it in time?

tangey · May 30, 2011

Ailuros said:
How do you get a bucket to overflow when you always empty it in time?

ummmm...shake it ?

Ailuros · May 30, 2011

tangey said:
ummmm...shake it ?

That's not what I understand under overflow; nice try though nonetheless.

Npl · May 30, 2011

Ailuros said:
How do you get a bucket to overflow when you always empty it in time?

you cant, but thats an analogy for immediate renderers I guess.
With a TBDR you cant render a pixel until you know theres no further triangle that "hits" it - which means you cant empty your bucket until you poured all the water in (finished the scene). Thats assuming the TBDR requires to operate a single pass, if not then it needs to create a incomplete picture (and some information about ZValues) and then it can empty the bucket before accepting more water.

Ailuros · May 30, 2011

Npl said:
you cant, but thats an analogy for immediate renderers I guess.
With a TBDR you cant render a pixel until you know theres no further triangle that "hits" it - which means you cant empty your bucket until you poured all the water in (finished the scene). Thats assuming the TBDR requires to operate a single pass, if not then it needs to create a incomplete picture (and some information about ZValues) and then it can empty the bucket before accepting more water.

The 4 cores operate quite in a complicated fashion when it comes to macro- and micro-tiling.

http://worldwide.espacenet.com/publ...T=D&date=20090604&CC=WO&NR=2009068895A1&KC=A1

http://worldwide.espacenet.com/publ...T=D&date=20090604&CC=WO&NR=2009068895A1&KC=A1

Or display list related patents like that one: http://worldwide.espacenet.com/publ...T=D&date=20090924&CC=WO&NR=2009115778A1&KC=A1

Npl · May 30, 2011

Ailuros said:
The 4 cores operate quite in a complicated fashion when it comes to macro- and micro-tiling.

http://worldwide.espacenet.com/publ...T=D&date=20090604&CC=WO&NR=2009068895A1&KC=A1

http://worldwide.espacenet.com/publ...T=D&date=20090604&CC=WO&NR=2009068895A1&KC=A1

Or display list related patents like that one: http://worldwide.espacenet.com/publ...T=D&date=20090924&CC=WO&NR=2009115778A1&KC=A1

and how is multicore related to the problems of overflowing buffers?
the problem is that you cant start (fragment-)processing a single tile unless you know there is nothing, like say a translucent triangle above the ones you have in your displaylist, that affects the outcome.
you are limited in the amount of information you can store before you begin rendering, so either you decide to drop something and hope none notices or you render what-you-have and then accept new data (the extreme example beeing immediate renderers, or some "hybrid" renderer that only defers aslong there is space).

Ailuros · May 30, 2011

Npl said:
and how is multicore related to the problems of overflowing buffers?

How is it unrelated in this particular thread in the first place? You've got 4 GPU cores in the NGP, so what how and why would overflow? Do you have same sized or dynamically sized macro tiles for all of the 4 cores and one or multiple display lists? I can only imagine they use multi-level display lists (or buffers or whatever one wants to call them), compress the hell out of it and store only the absolutely necessary.

the problem is that you cant start (fragment-)processing a single tile unless you know there is nothing, like say a translucent triangle above the ones you have in your displaylist, that affects the outcome.
you are limited in the amount of information you can store before you begin rendering, so either you decide to drop something and hope none notices or you render what-you-have and then accept new data (the extreme example beeing immediate renderers, or some "hybrid" renderer that only defers aslong there is space).

Don't you think engineers have taken any of those considerations into account? Heck IMG has more than a few patents that affect display list control, compression etc etc. I can't imagine the display list(-s) are that small that they can be overflown that easily and doubt even more that if you manage to overflow that one you wouldn't get an IMR or hybrid whatever into the exact same theoretical trouble.

Exophase · May 30, 2011

IMG's GPUs do have support for multi-pass rendering, they don't have a problem loading a framebuffer from memory (and storing one to memory) if necessary. I'm almost certain that they'll stop binning and render what they have if they run out of space (including the space for the depth/stencil buffer they now need..)

Npl · May 31, 2011

Ailuros said:
How is it unrelated in this particular thread in the first place? You've got 4 GPU cores in the NGP, so what how and why would overflow? Do you have same sized or dynamically sized macro tiles for all of the 4 cores and one or multiple display lists? I can only imagine they use multi-level display lists (or buffers or whatever one wants to call them), compress the hell out of it and store only the absolutely necessary.

4 cores only means 4 times the buffer size, doesnt change a thing for me. fixed size is fixed size and dynamic workload can exceed it.

Ailuros said:
Don't you think engineers have taken any of those considerations into account? Heck IMG has more than a few patents that affect display list control, compression etc etc. I can't imagine the display list(-s) are that small that they can be overflown that easily and doubt even more that if you manage to overflow that one you wouldn't get an IMR or hybrid whatever into the exact same theoretical trouble.

Im pretty sure IMG has chosen a way to deal with this, however unlikely a problem it might be. Certainly devs on a closed platform will be vary of any limits (which likely are high enough so you dont reach them in realtime graphics) but a generic OpenGL driver surely has to take such things into account. And I just asked which way they deal with the problem (as Im curious of the implications)

@Exophase: Thanks. I guess this could be a reason recent PowerVR GPUs cant guarantee order-independent transparency, the output might dependent on where the rendering is halted and stored possibly truncated (less accurate then one-pass rendering)

Aeoniss · May 31, 2011

Well then..

Ailuros · May 31, 2011

Npl said:
4 cores only means 4 times the buffer size, doesnt change a thing for me. fixed size is fixed size and dynamic workload can exceed it.

Not necessarily 4x times the buffer size; unfortunately details are quite sparse and it takes a lot of guesswork what they're exactly doing. However assuming that you have under an instance all 4 cores completely available, if the scene gets split up into 4 viewports/macro tiles why would you need 4x times the buffer size? It's an honest question. Note that the relevant patent for MPs doesn't specify whether one or multiple display lists will be used; it's an either/or option.

Im pretty sure IMG has chosen a way to deal with this, however unlikely a problem it might be. Certainly devs on a closed platform will be vary of any limits (which likely are high enough so you dont reach them in realtime graphics) but a generic OpenGL driver surely has to take such things into account. And I just asked which way they deal with the problem (as Im curious of the implications).

I'd be interested too in an as simple as possible explanation (in order to understand it myself) but I suspect they keep it under wraps as some sort of secret sauce.

In any case if there should be any cases where for whatever reason a DR would be forced to operate as an IMR (always in a highly relative sense) one of its advantages it would lose IMO would be effective fill-rate amongst others. But since in the embedded space GPUs in general don't have excessive fill-rates I don't see it as problem. If the NGP GPU is clocked at 200MHz as rumors want it, then it has 400MTexels and 3.2 GPixels z/stencil raw fill-rates per core.

Besides as Arun already noted, senior Simon's 2bpp & 4bpp PVRTC are a blessing aside other things.

Rys · May 31, 2011

Ailuros said:
I'd be interested too in an as simple as possible explanation (in order to understand it myself) but I suspect they keep it under wraps as some sort of secret sauce.

It's not necessarily secret sauce as such, but you're right that we haven't talked about it much in public yet. I'll see about changing that, so there's a bit more information about how MP works at the work distribution and memory costs level.

JohnH · May 31, 2011

Actually I think we can be specific in saying that there is no significant change in memory cost associated with multi-core, I'm not sure why anyone would think there was.

Pretty certain there's been a public talk given on Multi-core by Tony King Smith that explained it's operation pretty well.

John.

Deleted member 13524 · May 31, 2011

ToTTenTranz said:
Doesn't the GPU-dedicated RAM need to increase in bandwidth as you increase the number of cores? Not theoretically of course, but practically.

You couldn't get a "high-end" version of your GPUs to "infinitely" scale linearly with increasing the number of cores without increasing memory bandwidth, right?

Rys said:
Right, but the bandwidth requirement for extra cores is low.

JohnH said:
Actually I think we can be specific in saying that there is no significant change in memory cost associated with multi-core, I'm not sure why anyone would think there was.

I'm not trying to start a flamewar between comrades or anything.. but where are we standing exactly?

Is it "so low" that you consider it "non significant"?
What ratios are we talking about? For each 100% increase in cores, you'll need 10% increase in memory bandwidth? More? Less? Not allowed to specify?

Ailuros · May 31, 2011

Series5XT scales up to 16 cores and not more. Ditto though for Series6/Rogue.

In any case if they claim officially themselves that past 16 it doesn't make any sense anymore, then obviously it wouldn't be worth bothering for something over 16 in pure theory for such a hypothetical case.

As for where you're standing, uhmm trust the more experienced one out of the two

Deleted member 13524 · May 31, 2011

There's a core amount cap for series 6?

Rys · May 31, 2011

ToTTenTranz said:
I'm not trying to start a flamewar between comrades or anything.. but where are we standing exactly?

We said the same thing, just in different words.

Is it "so low" that you consider it "non significant"?
What ratios are we talking about? For each 100% increase in cores, you'll need 10% increase in memory bandwidth? More? Less? Not allowed to specify?

No MP config I know of has considered adjusting memory config to increase bandwidth to help performance, versus single core. We obviously don't want to give away figures (sadly, I'd love to walk you through a profiled frame or frames and discuss the consumers).

Sony's Next Generation Portable unveiling - PSP2 in disguise

GZ007

Npl

Npl

Ailuros

Epsilon plus three

tangey

Ailuros

Epsilon plus three

Npl

Ailuros

Epsilon plus three

Npl

Ailuros

Epsilon plus three

Exophase

Npl

Aeoniss

Ailuros

Epsilon plus three

Rys

Graphics @ AMD

JohnH

Deleted member 13524

Guest

Ailuros

Epsilon plus three

Deleted member 13524

Guest

Rys

Graphics @ AMD

Similar threads