Thoughts about the new VRAM configuration for the PSP

From:

PSP Graphics Core

1-166MHz (1.2V)
256-bit bus
2MB eDRAM (VRAM)
5.3Gbps bus bandwidth
664 million pixels per second pixel fill rate
3D curved surface and 3D polygon engine
Support for compressed textures, hardware clipping, morphing, bone, tessellation, bezier, b-spline (NURBS)
Maximum of 33 million polygons per second
24-bit full color (RGBA)

http://www.gamespot.com/all/news/news_6072659.html



To:

ext_tech.jpg


Source: http://www.extremetech.com/slideshow_viewer/0,2393,l=&s=201&a=133950&po=2,00.asp


(166 MHz * 512 bits) / ( 8 bits/bytes) = 10.4 GB/s

I wonder what exactly did they do.

Did they add another bus to the VRAM to be able to do parallel READ and WRITE operations sort of like what we have on the GS for the frame-buffer ?

Did they use the doubled bus-width for textures (to speed-up filtering) ?



doncale said:
interesting update Panajev. we know that PS2's GS has approx 48 GB/sec bandwidth, which it needs to feed 16 pixel engines/pipelines. PSP only has 4 pipelines to feed so 10.4 GB/sec seems pretty decent.

10.4 * 4 = 41.6 GB/s

Very close in terms of bandwidth: I suspect that more bandwidth is being pulled from the Texture Units than the Pixel Engines.

GS' e-DRAM had two busses that connected the Pixel Engines to the frame-buffer area and one smaller bus for texture data.

Each DRAM macro on the GS had basically three busses.

PlayStation 2's GS: 2,560 = 16 * 64 (from FB) + 16 * 64 (to FB) + 16 * 32 (from TB)

PSP's GPU: 512 bits.

http://www.extremetech.com/slideshow_viewer/0,2393,l=&s=201&a=133950&po=9,00.asp

If this is the same DRAM macro we have 2 of these in the GPU and 2 of these for the Media Engine with the difference that the Media Engine e-DRAM would use a 128 bits interface for the two banks in the DRAM macro and the GPU is using a 128 bits interface for each bank.

128 bits * 2 banks * 2 DRAM macros = 512 bits.

Pretty nice that they went and modified that part of the specs.

I think they did not have enough bandwidth for bi-linear filtering to be done in a single cycle and they faced with the option of 322 MPixels/s of fill-rate with bi-linear filtering on or havign to majorly re-work the Rendering core to add the opportune resources (read: bigger texture cache) to allow bi-linear texture filtering to be done in a single cycle.

Before:

32 bits RGBA (PSP can do 32 bits rendering) + 32 bits Z (or 24 bits Z + 8 bits Stencil) = 64 bits per Pixel Engine.

64 * 4 = 256 bits

Maybe they had different Pixel Engines in mind, thus fitting texture data into 256 bits.

We need at least, even thinking about nice caches in the Rendering core like the GS has, 64 bits of texture data per cycle (4x16 bits)

That would elave you with no bandwidth at all for textures, you would have to pull a GS and re-use Pixel Engines as TMUs.

With a 512 bits bus you can have:

4 Pixel Engines each pulling/pushing 64 bits of data for a total of 256 bits

4 TMUs loading 64 bits of data each for a total of 256 bits.

This allows single cycle bi-linear without texture cache (the GS filters from the texture cache) access with 16 bits texels.

On the GS you achieve single cycle bi-linear only if your texture fits the 8 KB Texture Cache on the GS Rendering core: if you break it (texture is too large) the cache is re-filled at 150 GB/s or 8 KB per cycle.

The PSP GPU could do things like this:

If the texture fits the cache you can do bi-linear in a single cycle: if you use 16 bits textures then you could pull them from the VRAM instead of the texture cache and still do not drop to 2 cycles for bi-linear filtering.

This means that you should not expect big textures to kill the PSP GPU if they use 16 bits color depth (you can still produce a 32 bits result from 4x16 bits samples).
BTW, a note on Flipper (GCN's GPU).

GCN's Flipper can only do single cycle tri-linear with 16 bits texels, btw: it takes two cycle with 32 bits texels IIRC.

The TEV (I think it is the TEV that does texture filtering) can load 32x16 bits texels texels (512 bits texture interface) and 32 texels are enough for 4 pixel pipelines (tri-linear needs needs 8 input texels).
 
An interesting 'beast'...8) I haven't really been following this but what do they mean by the 'Surface Engine' in that image? Do they mean 'Geometry Engine' or 'Vertex Shader' by that?

Sooner or later Sony are gonna run out of 'Engines' to tag!. I'm waiting for them to apply a 'light' engine (TM) or something next ! :p
 
Jaws said:
An interesting 'beast'...8) I haven't really been following this but what do they mean by the 'Surface Engine' in that image? Do they mean 'Geometry Engine' or 'Vertex Shader' by that?

http://www.extremetech.com/article2/0,1558,1639250,00.asp
Sony apparently will support a graphics model based on surfaces, rather than polygons. Okabe displayed an illustration of a cartoon character that looked more realistic than a polygon-based model, which he said contained the same amount of data. The graphics block will also be capable of vertex blending, a morphing technology that can interpolate changes made between objects.

0%2c1311%2csz%3d1%26i%3d78748%2c00.jpg

0%2c1311%2csz%3d1%26i%3d78747%2c00.jpg
 
Jaws said:
One, thanks... a NURBS Engine = Surface Engine then?

It is a hardware-based curved-surface generator and tessellator which supports bezier, b-spline/NURBS, clipping and so on. It will reduce input data into GPU and save overall memory usage and bandwidth. Presumably it's associated with the initial spec of 8MB memory, but considering porting traditional programs on polygon/vertex model it seems SCE compromised on it for developers ease. So 1st-gen PSP games may not look very well until programmers get used to this new model rarely seen in PC graphics.

OTOH curved surface is useful to abstract GPU power. When Surface Engine is supported on other weaker/stronger hardware such as cell phone or PS3 you can recycle model data on it.
 
Panajev said:
If the texture fits the cache you can do bi-linear in a single cycle: if you use 16 bits textures then you could pull them from the VRAM instead of the texture cache and still do not drop to 2 cycles for bi-linear filtering.
If caches worked by having to store entire data sturcture to get good speed, they'd be all but useless. But anyway, why would anyone want to use 16bit textures? You expect that PSP will not support any compressed format at all, not even Clut?


one said:
It will reduce input data into GPU and save overall memory usage and bandwidth. Presumably it's associated with the initial spec of 8MB memory, but considering porting traditional programs on polygon/vertex model it seems SCE compromised on it for developers ease.
While they have their uses, curves aren't a very good general geometry representation.
In regards to ports from PS2, you'd be looking at ~7:1 ratio in memory (28:4 after you substract the size of executable). If they wanted to go with a smaller memory footprint, using a CPU with a stripped down ISA with smaller memory footprint would be the place to start, not forcing geometry representations that don't really work that well.
 
Fafalada said:
In regards to ports from PS2, you'd be looking at ~7:1 ratio in memory (28:4 after you substract the size of executable).

Ok, explanations are required, if you don't mind Faf!! :?

In your ratio, the 7 is the main memory pool of what? PS2?
BTW doesn't PSP support the almighty S3TC, doesn't that give the PSP a slightly advantage?
And how 32Mo + 4 Mo be so different compared to 32Mo + 2Mo + 2Mo?
Is the rumor about a big chunk of the memory being squated by the OS/kernel true? Or are counting in your ratio, the fact that the PS2 could stream data "for free" while the same can't be done on PSP due to power comsuption restrictions?
And why does Donald Duck wear a towel when getting out of the shower when he usually doesn't even wear pants?
 
Vysez said:
In your ratio, the 7 is the main memory pool of what? PS2?
It was a reference to 8MB initial spec and what was said about it.

Anyway, I was just disagreeing with the memory expansion being some kind of help-out for 'lazy' developers. Mostly because it makes underlying suggestion that PS2 games don't utilize any data saving techniques whatsoever, so that PSP having something hardwired would give it massively larger potential in the area.

And why does Donald Duck wear a towel when getting out of the shower when he usually doesn't even wear pants?
That's a good question, but I think only Sony engineers could properly answer it. :LOL:
 
Thanks for the anwser Faf. :D

Fafalada said:
That's a good question, but I think only Sony engineers could properly answer it. :LOL:

...Mhh, don't know. Imgtec's engineers have years of experience on the field of ducks and likes, so we might ask Simon (If he has some time to spare before preparing the daily's liters of tea needed by Imgtec).
Maybe the Dreamcast has some sort of a "Donald_Duck_wearing_a_towel_Paradox buffer". :oops:
 
Fafalada said:
Panajev said:
If the texture fits the cache you can do bi-linear in a single cycle: if you use 16 bits textures then you could pull them from the VRAM instead of the texture cache and still do not drop to 2 cycles for bi-linear filtering.
If caches worked by having to store entire data sturcture to get good speed, they'd be all but useless.

No, but they need to be big enough to hold the "computational cluster" (sorry, going from the notion of locality and access to memory clustering in different regions).

Is this how you feel though about the GS Texture Cache/Buffer ?

You cannot tell me you get single cicle bi-linear filtering with textures that pass the 8 KB limit as it fills 8 KB per cycle: you go to say 10 KB or even 9 KB and the texture will break the page.

I know that it depends where are the texels you need to sample, but say thatwe need a part of the texture which is not in the page, we need to load the rest and that will take a cycle.

Ok, it will statistically not happen that much to halve the fill-rate, especially if programmers keep textures under control.

In some fields that how it happens though: you grow the caches more and more to make sure you can fit in the core of your processing work or you can worry about main memory latency much less (Intel's plans to put >24 MB of SRAM cache in future Itanium designs at 90 nm).

But anyway, why would anyone want to use 16bit textures? You expect that PSP will not support any compressed format at all, not even Clut?

Did I say 16 bits uncompressed ?

Can I hope that you can load from VRAM compressed textures and decompress-them on the Rendering core before the input texels are needed ?

On GCN we have the same dilemma with tri-linear: for 32 bits texels you need 2 cycles. It is true that your fill-rate halves if you use FSAA so you hide the extra latency there... still the issue is there, some developer might not want to use FSAA.



Why would you think they doubled the bus-bandiwdth ?

Not Enough bandwidth for the textures too ? Couldn't a single 128 bits texture bus or less be enough ?

After-all, if we are filling a cache from which the real filtering and texturing will be performed then we do not need an ultra wide bus filling it: Pentium 4's L2 is 256 bits wide and is fed by a 64 bits data bus.

They had 64 bits of bandwidth per pixel pipeline: the NV40 uses a 256 bits path to the VRAM and it has to feed 16 Pixel Pipelines (and more).

If you did a simple division in the NV40 case you would get (I know I am kinda doing something very fishy) 16 bits per pipeline.

With good enough caches and only 4 Pixel Pipelines a 256 bits bus should have been quite good.

You know what ? I think I know what the Rendering core of the PSP's GPU is... something that SCEI called GS 1.5 (upgraded rendering core) or probably a more enhanced version of it (~GS2) with the opportune scaling due to the nature of portable machines (less pipelines, etc...).
 
Panajev said:
You cannot tell me you get single cicle bi-linear filtering with textures that pass the 8 KB limit as it fills 8 KB per cycle: you go to say 10 KB or even 9 KB and the texture will break the page.
First, page buffer is not cache per se. Second, you have the wrong idea about how they work.

Just like with cache misses on say, NV2a, page breaks will only cause slower rendering speed when they happen too frequently for the rendering pipeline to hide the latencies.
I can have a 1024x1024x32bit texture, and if my UVs are mapped friendly enough in respect to page layout, I can still maintain maximum pixel rendering speed.

Did I say 16 bits uncompressed ?
Is there any other kind? Basically all popular compressed texture formats average 4-8bits/texel, and most of them are higher quality then 16bit RGB for that matter.

Can I hope that you can load from VRAM compressed textures and decompress-them on the Rendering core before the input texels are needed ?
Is there some way to decompress data before you actually read it from memory? :? *scratching head*

On GCN we have the same dilemma with tri-linear: for 32 bits texels you need 2 cycles.
That only matters if you asume that PSP uses eDram directly, with no texel cache in between. In case of GC - the embeded memory IS cache, can we be sure that it's same with PSP?
 
Fafalada said:
Panajev said:
You cannot tell me you get single cicle bi-linear filtering with textures that pass the 8 KB limit as it fills 8 KB per cycle: you go to say 10 KB or even 9 KB and the texture will break the page.
First, page buffer is not cache per se. Second, you have the wrong idea about how they work.

Just like with cache misses on say, NV2a, page breaks will only cause slower rendering speed when they happen too frequently for the rendering pipeline to hide the latencies.
I can have a 1024x1024x32bit texture, and if my UVs are mapped friendly enough in respect to page layout, I can still maintain maximum pixel rendering speed.

Which is what I thought I said here ;):

I know that it depends where are the texels you need to sample, but say thatwe need a part of the texture which is not in the page, we need to load the rest and that will take a cycle.

Ok, it will statistically not happen that much to halve the fill-rate, especially if programmers keep textures under control.

Anyway I was prolly being messy as usual. Thanks for the clear explanation :).

Did I say 16 bits uncompressed ?
Is there any other kind? Basically all popular compressed texture formats average 4-8bits/texel, and most of them are higher quality then 16bit RGB for that matter.

Can I hope that you can load from VRAM compressed textures and decompress-them on the Rendering core before the input texels are needed ?
Is there some way to decompress data before you actually read it from memory? :? *scratching head*

Doesn't the NV20 stores compressed texture in the L2 Texture Cache, but uncomprerssed texels in the L1 Texture Cache for example ?

I was hoping that the PSP's GPU Texture Cache could load compressed textures and give them to the TMUs in compressed forms without having to store the uncompressed texels in an intermediate on-chip storage and feed the TMUs from there.


Can you at least theorize why a 4 pipelines chip rendering to a 480x272 screen would need a 512 bits memory bus ?
 
Panajev said:
Can you at least theorize why a 4 pipelines chip rendering to a 480x272 screen would need a 512 bits memory bus ?
I rather wouldn't.
But well, take for example... (32bit Color + Z) * 4 pipelines = 256 read + 256 write...
 
Fafalada said:
Panajev said:
Can you at least theorize why a 4 pipelines chip rendering to a 480x272 screen would need a 512 bits memory bus ?
I rather wouldn't.
But well, take for example... (32bit Color + Z) * 4 pipelines = 256 read + 256 write...

Uhm... parallel read and writes have their benefit as they keep the GPU able to read data while rendering: you have better efficiency, but if this comes at the cost of 2-cycles bi-linear or having to use two Pixel Engines as TMUs... :(.

*Shakes fist* I see no bandwidth to load the textures in the cache ;).

Do not tell me that non-texturing it does 2x2 pixel blocks and when texturing it does 2x1 pixel blocks because then it really is a mini GS 1.5.

I will refuse to believe for now that textured fill-rate is 332 MPixels/s :(.
 
V3 said:
I will refuse to believe for now that textured fill-rate is 332 MPixels/s

I always assumed that was the case. The textured fill rate is half the raw fill rate.

Blast, this would mean that the PSP has quite lot less fill-rate than PlayStation 2 even taking the screen resolution into account.

640x448 / 480x272 ~= 2.2

1.2 GPixels/s / 2.2 ~= 545 MPixels/s

Close enough I'd say, but still not the same amount.

This would be the first tiem ever I see a 2x1 pixel footprint though.

Ok, I think I know exactly which GPU rendering core the PSP is using (something close to the GS 2 work-in-project [back in 2001], if only I could track more info about the GS 1.5-2.

GS 1.5 had different e-DRAM macros and few enhancements if any at all to the rendering core and GS 2 had the bulk of enhancements to the rendering core IIRC.

http://realworldtech.com/page.cfm?ArticleID=RWT022001001645

The GPU described in that link was what I was told the so-called GS 1.5: notice how much wider the e-DRAM busses are (~4x).

I I am wrong, hopefully someone will set me straight and teach me something new :).
 
Panajev2001a said:
Blast, this would mean that the PSP has quite lot less fill-rate than PlayStation 2 even taking the screen resolution into account.

Thus Surface-Engine-oriented development is an essential thing, depending on relatively high geometry/T&L peformance that enables curved-surface support on PSP, I guess. How do you expect the graphics module in handheld PSP with less memory/watt is a direct descendant of GS 1.5/2 for workstation? PS3 GPU may be another different line based on Cell with NURBS by software, higher parallelism, higer polygon performance, etc., but at least I suppose PS2 and PSP are very different in the policy of GPU and can't be compared only by performance numbers.

Programming the PlayStation Portable (PSP) session in Austin Game Conference is 2 weeks away, then I expect your thirst will be quenched somewhat :D
 
with this thread, this is the first time I have heard any info whatsoever on Graphics Synthesizer 2, other than the much publicized plans of Sony back in 1999 to produce an GS2 by 2002

also, is GS 1.5 the same thing as GS I-32 ?
(GS I-32 is the 32 MB eDRAM version of GS used in GSCube)


I myself never really thought about PSP using some derivative of the work put into GS2.....


and so PSP really has only a 332 Mpixel *textured* fillrate. well, I suppose Gamecube advocates can breath a sigh of relief. they now can know that PSP GPU does not slightly surpass Flipper in textured fillrate, unlike before when it seemed that PSP was slightly ahead of Gamecube's Flipper in that area (PSP's 664 vs Flipper's 648 Mpixels)
 
Panajev said:
*Shakes fist* I see no bandwidth to load the textures in the cache
Why would you have to though? Majority of GPUs out there don't have the raw bandwith available from their VRam to support maximum rendering load.
They rely on caches to minimize bandwith use and thus allow them to maintain the speed.

GS is one of the few exceptions to that rule and even there they used some (very)tiny intermediate caches (I don't mean the page buffers) to accelerate a few things.
 
Fafalada said:
In regards to ports from PS2, you'd be looking at ~7:1 ratio in memory (28:4 after you substract the size of executable). If they wanted to go with a smaller memory footprint, using a CPU with a stripped down ISA with smaller memory footprint would be the place to start, not forcing geometry representations that don't really work that well.

We saw MIPS64(PS2) -> MIPS32(PSP) already.
What can they do other than this for smaller footprint?
 
Back
Top