Sony's Next Generation Portable unveiling - PSP2 in disguise

If the game cards are fast enough than 256MB should be enough. At least for NGP-s resolution and textures. Streaming for example with 30MB/s could be enough to make the 512MB only usefull in rare cases.
 
If the game cards are fast enough than 256MB should be enough. At least for NGP-s resolution and textures. Streaming for example with 30MB/s could be enough to make the 512MB only usefull in rare cases.
Yeah should be enough for graphics, now we only need ~1GB for the CPU too =).
Cant see how streaming would help in the case of huge (and possible editable) worlds, and while streaming can help to overcome limitations its still no substitute for RAM, comes at a cost in terms of additional development time and imposes restrictions on the game/levels. Not exactly a good point if cheap-ass Android game on otherwise underpowered smartphones can have a richer experience just to save 20-30€ on a good amount of RAM.
 
Well, I suppose we should all be thankful to Simon F for developing a good 2-bit texture compression format at least! ;) Also TBDR means MSAA won't take more memory. Probably the most interesting factor to consider is the performance penalty if/when the binning process runs out of space since it takes a *variable* amount of memory.
what happens if the binning process (i suppose thats "storing and sorting triangles" ) runs out of memory?
Does the GPU then render the incomplete scene to free memory before accepting additional commands? That would require an Z-Buffer aswell i suppose...
 
what happens if the binning process (i suppose thats "storing and sorting triangles" ) runs out of memory?
Does the GPU then render the incomplete scene to free memory before accepting additional commands? That would require an Z-Buffer aswell i suppose...

How do you get a bucket to overflow when you always empty it in time?
 
How do you get a bucket to overflow when you always empty it in time?
you cant, but thats an analogy for immediate renderers I guess.
With a TBDR you cant render a pixel until you know theres no further triangle that "hits" it - which means you cant empty your bucket until you poured all the water in (finished the scene). Thats assuming the TBDR requires to operate a single pass, if not then it needs to create a incomplete picture (and some information about ZValues) and then it can empty the bucket before accepting more water.
 
you cant, but thats an analogy for immediate renderers I guess.
With a TBDR you cant render a pixel until you know theres no further triangle that "hits" it - which means you cant empty your bucket until you poured all the water in (finished the scene). Thats assuming the TBDR requires to operate a single pass, if not then it needs to create a incomplete picture (and some information about ZValues) and then it can empty the bucket before accepting more water.

The 4 cores operate quite in a complicated fashion when it comes to macro- and micro-tiling.

http://worldwide.espacenet.com/publ...T=D&date=20090604&CC=WO&NR=2009068895A1&KC=A1

http://worldwide.espacenet.com/publ...T=D&date=20090604&CC=WO&NR=2009068895A1&KC=A1

Or display list related patents like that one: http://worldwide.espacenet.com/publ...T=D&date=20090924&CC=WO&NR=2009115778A1&KC=A1
 
and how is multicore related to the problems of overflowing buffers?
the problem is that you cant start (fragment-)processing a single tile unless you know there is nothing, like say a translucent triangle above the ones you have in your displaylist, that affects the outcome.
you are limited in the amount of information you can store before you begin rendering, so either you decide to drop something and hope none notices or you render what-you-have and then accept new data (the extreme example beeing immediate renderers, or some "hybrid" renderer that only defers aslong there is space).
 
and how is multicore related to the problems of overflowing buffers?

How is it unrelated in this particular thread in the first place? You've got 4 GPU cores in the NGP, so what how and why would overflow? Do you have same sized or dynamically sized macro tiles for all of the 4 cores and one or multiple display lists? I can only imagine they use multi-level display lists (or buffers or whatever one wants to call them), compress the hell out of it and store only the absolutely necessary.

the problem is that you cant start (fragment-)processing a single tile unless you know there is nothing, like say a translucent triangle above the ones you have in your displaylist, that affects the outcome.
you are limited in the amount of information you can store before you begin rendering, so either you decide to drop something and hope none notices or you render what-you-have and then accept new data (the extreme example beeing immediate renderers, or some "hybrid" renderer that only defers aslong there is space).
Don't you think engineers have taken any of those considerations into account? Heck IMG has more than a few patents that affect display list control, compression etc etc. I can't imagine the display list(-s) are that small that they can be overflown that easily and doubt even more that if you manage to overflow that one you wouldn't get an IMR or hybrid whatever into the exact same theoretical trouble.
 
IMG's GPUs do have support for multi-pass rendering, they don't have a problem loading a framebuffer from memory (and storing one to memory) if necessary. I'm almost certain that they'll stop binning and render what they have if they run out of space (including the space for the depth/stencil buffer they now need..)
 
How is it unrelated in this particular thread in the first place? You've got 4 GPU cores in the NGP, so what how and why would overflow? Do you have same sized or dynamically sized macro tiles for all of the 4 cores and one or multiple display lists? I can only imagine they use multi-level display lists (or buffers or whatever one wants to call them), compress the hell out of it and store only the absolutely necessary.
4 cores only means 4 times the buffer size, doesnt change a thing for me. fixed size is fixed size and dynamic workload can exceed it.

Don't you think engineers have taken any of those considerations into account? Heck IMG has more than a few patents that affect display list control, compression etc etc. I can't imagine the display list(-s) are that small that they can be overflown that easily and doubt even more that if you manage to overflow that one you wouldn't get an IMR or hybrid whatever into the exact same theoretical trouble.
Im pretty sure IMG has chosen a way to deal with this, however unlikely a problem it might be. Certainly devs on a closed platform will be vary of any limits (which likely are high enough so you dont reach them in realtime graphics) but a generic OpenGL driver surely has to take such things into account. And I just asked which way they deal with the problem (as Im curious of the implications)

@Exophase: Thanks. I guess this could be a reason recent PowerVR GPUs cant guarantee order-independent transparency, the output might dependent on where the rendering is halted and stored possibly truncated (less accurate then one-pass rendering)
 
ngprumor.jpg


Well then..
 
4 cores only means 4 times the buffer size, doesnt change a thing for me. fixed size is fixed size and dynamic workload can exceed it.

Not necessarily 4x times the buffer size; unfortunately details are quite sparse and it takes a lot of guesswork what they're exactly doing. However assuming that you have under an instance all 4 cores completely available, if the scene gets split up into 4 viewports/macro tiles why would you need 4x times the buffer size? It's an honest question. Note that the relevant patent for MPs doesn't specify whether one or multiple display lists will be used; it's an either/or option.

Im pretty sure IMG has chosen a way to deal with this, however unlikely a problem it might be. Certainly devs on a closed platform will be vary of any limits (which likely are high enough so you dont reach them in realtime graphics) but a generic OpenGL driver surely has to take such things into account. And I just asked which way they deal with the problem (as Im curious of the implications).

I'd be interested too in an as simple as possible explanation (in order to understand it myself) but I suspect they keep it under wraps as some sort of secret sauce.

In any case if there should be any cases where for whatever reason a DR would be forced to operate as an IMR (always in a highly relative sense) one of its advantages it would lose IMO would be effective fill-rate amongst others. But since in the embedded space GPUs in general don't have excessive fill-rates I don't see it as problem. If the NGP GPU is clocked at 200MHz as rumors want it, then it has 400MTexels and 3.2 GPixels z/stencil raw fill-rates per core.

Besides as Arun already noted, senior Simon's 2bpp & 4bpp PVRTC are a blessing aside other things.
 
I'd be interested too in an as simple as possible explanation (in order to understand it myself) but I suspect they keep it under wraps as some sort of secret sauce.
It's not necessarily secret sauce as such, but you're right that we haven't talked about it much in public yet. I'll see about changing that, so there's a bit more information about how MP works at the work distribution and memory costs level.
 
Actually I think we can be specific in saying that there is no significant change in memory cost associated with multi-core, I'm not sure why anyone would think there was.

Pretty certain there's been a public talk given on Multi-core by Tony King Smith that explained it's operation pretty well.

John.
 
Doesn't the GPU-dedicated RAM need to increase in bandwidth as you increase the number of cores? Not theoretically of course, but practically.

You couldn't get a "high-end" version of your GPUs to "infinitely" scale linearly with increasing the number of cores without increasing memory bandwidth, right?

Right, but the bandwidth requirement for extra cores is low.

Actually I think we can be specific in saying that there is no significant change in memory cost associated with multi-core, I'm not sure why anyone would think there was.


I'm not trying to start a flamewar between comrades or anything.. but where are we standing exactly?

Is it "so low" that you consider it "non significant"?
What ratios are we talking about? For each 100% increase in cores, you'll need 10% increase in memory bandwidth? More? Less? Not allowed to specify? :(
 
Series5XT scales up to 16 cores and not more. Ditto though for Series6/Rogue.

In any case if they claim officially themselves that past 16 it doesn't make any sense anymore, then obviously it wouldn't be worth bothering for something over 16 in pure theory for such a hypothetical case.

As for where you're standing, uhmm trust the more experienced one out of the two ;)
 
ToTTenTranz said:
I'm not trying to start a flamewar between comrades or anything.. but where are we standing exactly?
We said the same thing, just in different words.

Is it "so low" that you consider it "non significant"?
What ratios are we talking about? For each 100% increase in cores, you'll need 10% increase in memory bandwidth? More? Less? Not allowed to specify? :(
No MP config I know of has considered adjusting memory config to increase bandwidth to help performance, versus single core. We obviously don't want to give away figures (sadly, I'd love to walk you through a profiled frame or frames and discuss the consumers).
 
Back
Top