Can this SONY patent be the PixelEngine in the PS3's GPU?

Panajev2001a · Jun 2, 2004

ERP said:
See, I understand what you are saying, but why do we see in off-line CG so much time ( more or less 30% of rendering time ) spent in Texture I/O ?

Click to expand...

Because Texture IO is incredibly slow in offline rendering, best case it's coming out of main memory, more likely off disk. It's the primary reason that renderman uses a tile based system (to maximise texture locality).

In a real time system your working with somewhat different constraints. Yes texture latency is still huge by comparison to ALU ops, but a lot of texture ops in current shaders are there to approximate calculations not as part of the art assets. Really how many textures are we likely to see combined on a pixel.

I've been looking at lighting models that take >25 dot products/pixel and that doesn't include transforming all the inputs into the right space. Once you start doing lighting at a pixel level and you start looking at better lighting models, you can easilly spend 100's of ALU ops per pixel, texture complexity even if it weren't constrained by memory just isn't going to explode like that.

No, it will not explode like that, but it still is going to increase compared to what you do now especially because I do not see the jump in Math ops per cycle to be that massive to eliminate completely the use of cube-maps, 3D Textures, etc... as look-up/shortcuts.

Even if we do eliminate the short-cuts, we are likely to see an increase of texture data usage: no it will not grow like Shader ops usage, but it will be a problem that needs to be taken care of.

If we want to support a huge number of Math ops per fragment we should point towards not only a large number of ALUs, but also to a decent efficiency.

I know that parallelism helps, we can have more pixel in flights and even if the texture takes a bit to get to the ALU, we are hiding the latency by having so many pixels being processed.

It is the idea behind the story: we can have a ALU that clocks at 100 MHz and we can do the same work with two ALUs that are each half as efficient.

Depending on how we do take care of the latency in the APUs, we will vary their efficiency.

If texture fetches take lots of cycles then APU's IPC will go down a lot and in order to compensate to the efficiency lost we will need more APUs dedicated to Pixel Shading work.

If we can afford the extra APUs, it is ok, but what if the efficiency drop is so high that we cannot afford the extra APUs ?

We would have the Shading power to run those long Shaders you mention ( for complex lighting models ), but in reality we will not be able to run them at decent speed unless we keep texture fetches to a minimum ( especially dependent texture reads that cannot be optimized by the "pack texture fetches and send them early to the Pixel Engines" kind of trick ).

I know that a solution can be found ( to the potential problem of resonably long latencies for texture fetches ) and we can even look at current patents for the APUs and this one for the SALC/SALP in order to see what can be done... as I said something can be done, I am sure, but I am interested into exploring what we can do...

Panajev2001a · Jun 2, 2004

Texture fetches create a dependency issue for the rest of the Shader ops and the APU will stall for the cycles taken by the texture fetch to come back... I do not see them set-up for OOOe nor automatic CMT ( switch on event MT ).

How slow would it be for the APU to DMA the current context ( Stack and PC ) back to shared DRAM ( we only have 128 KB of Ls per APU ) and start processing on a new pixel ?

I guess that context switching would not be needed shading pixels from the same primitive: the problem if primitives start descending in the 1-4 pixels range in terms of area then we have to make sure each APU is processing multiple primitives in parallel.

How would that work ? That is an area I am trying to think about.

On the bright side, if we looked at the hypotetical VS with 4 PEs and 32 nice little APUs we could see that we have potentially quite abit of pixels in flight at the same time and that with very long Shaders adding relatively few cycles due to texture fetch latency will not impact the situation too much as we are using much more Math ops than Texture fetches and we should have a quite high number of non dependent ( to Texture fetches ) Math ops that can run at full speed on the APU.

It is just the freak in me that wants the APUs to run as close to peak performance as possible.

Laa-Yosh · Jun 2, 2004

Re Subdivs... I've expected the valence thing to be something like this, just wanted to know for sure - thanks for the explanation!

ERP, most of the models I've made are >95% quad polygons, and most of the vertices have a valence of 4, with a few having 3 or 5. A valence of 6 should not happen as there are many ways to avoid it.
I'm not sure about the exact details but I can get you some statistics if you'd like to see them... All in all, I usually like to model 'clean' meshes, because they're generally better to work with later in the pipeline - irregular vertices can easily cause UV stretching, skinning problems, surface distorsion and so on.
So, I don't want to judge your artist without seeing his work at all, but maybe that subdiv car could be optimized a little. Then again, I'm almost exclusively working with characters, and I know that cars and other mechanical objects are a bit harder to build with subdivs...

London Geezer · Jun 2, 2004

ERP said:
artists apparently usually cheat in CG and just overlap pieces.

*guilty*....

Guden Oden · Jun 2, 2004

Laa-Yosh said:
ERP, most of the models I've made are >95% quad polygons, and most of the vertices have a valence of 4, with a few having 3 or 5. A valence of 6 should not happen as there are many ways to avoid it.

But all modern 3D renderers subdivide quads anyway to triangles, so the practical difference should be minimal. Even storage space should be the same...

Laa-Yosh · Jun 2, 2004

Guden Oden said:
But all modern 3D renderers subdivide quads anyway to triangles, so the practical difference should be minimal. Even storage space should be the same...

Nope, because triangulation only occurs AFTER the subdivision. I'd expect a realtime implementation to work like this, too; at least for Catmull-Clark subdivs, which work best with quad faces.
For triangle-based subdivs, there are other schemes, like butterfly or what - but these aren't really used, at least in offline rendering.

qwerty2000 · Jun 2, 2004

I think the ps3 should use 64 or 128-bit HDR rendering methods, supported by the latest PC graphics cards. High Dynamic Range rendering is the way to the future (Unreal Engine 3 has it too). We could be seeing CGI graphics in real time

London Geezer · Jun 2, 2004

qwerty2000 said:
I think the ps3 should use 64 or 128-bit HDR rendering methods, supported by the latest PC graphics cards. High Dynamic Range rendering is the way to the future (Unreal Engine 3 has it too). We could be seeing CGI graphics in real time

HDR on its own is not gonna show you CGI level graphics in real time. It helps, but there's much more to it that that

j^aws · Jun 2, 2004

I thought I'd do a quick recap of the of the diagrams from the various Sony patents hinting at the PS3's GPU...Strangely there are three FIG.6's (666

)

I'll call these DIAGs A-C in order of patent application,

DIAG A from Cell patent

DIAG B from SALC/SALP patent

DIAG C from Rendering patent

Okay....

Panajev2001a said:
nAo wrote:
Jaws wrote:

Would it really matter if APUs doing all shading ops (vertex and pixel) be replaced by say, vertex shading via APUs and pixel shading via these SALPs in the PixelEngine? Indeed, the GPU maybe without APUs and replaced by these SALPs entirely...
SALPs can effectively replace those parts of a GPU devoted to rasterizing, zbuffer/alpha/stencil tests. Shading is a work the mighty APUs

Uh-oh... so we might have found the building block for the Pixel Engines... thanks to Sony for the fun treasure hunt game .

Where's the treasure!

What I don't get is that the GPU in DIAG A has separate pools of eDRAM and image cache (VRAM?) but the GPU in DIAG C has a shared pool of image memory (VRAM?). Are these two differing GPUs? I.e. the GPU in DIAG C seems to be without APUs but built solely with SALPS? :?

The GPUs from A and C seem to be on differing buses aswell?:?

qwerty2000 · Jun 2, 2004

london-boy said:
qwerty2000 said:

I think the ps3 should use 64 or 128-bit HDR rendering methods, supported by the latest PC graphics cards. High Dynamic Range rendering is the way to the future (Unreal Engine 3 has it too). We could be seeing CGI graphics in real time

Click to expand...

HDR on its own is not gonna show you CGI level graphics in real time. It helps, but there's much more to it that that

I know the ps3 have much better ratuasuzer than the ps2 (when I mean better I mean like 100-200x more better). This time the ps3 will use FSAA raytracing,dispalcemt mapping,anostopic filtering,and more. But HDR is going to help alot.

nAo · Jun 2, 2004

qwerty2000 said:
I know the ps3 have much better ratuasuzer than the ps2 (when I mean better I mean like 100-200x more better). This time the ps3 will use FSAA raytracing,dispalcemt mapping,anostopic filtering,and more. But HDR is going to help alot.

How do you know? please, enlighten us!

Megadrive1988 · Jun 2, 2004

there are many elements needed to create realtime visuals that resemble pre-rendered CGI. not any single one element (i.e. an effect or set of effects) can give you that. framerate and lighting are critical elements. as is a decent amount of anti-aliasing. it is all the needed visual elements combined that would give the appearence of CGI-like graphics. if you're missing any one of the basic elements, or not enough of something (i.e. geometry, or framerate) you will lose the CGI look.

one thing though, we don't need raytracing to have CGIish graphics. even in CGI, raytacing is somewhat sparce. i.e. only some scenes in Toy Story 2 use raytracing. you can get away without having raytracing. in my opinion, even though the upcoming consoles *might* have enough processing power to handle light or modest amounts of raytracing, this is not the generation where raytracing is going to be used very much. it's like, texture mapping pre-1994 on the 3D chips used in the 16-Bit consoles. they barely had enough power to crank out a few thousand polys. while texture mapping could be done (SuperFX2 Commanche) it would not be used very much if at all.

I would not expect heavy amounts of raytracing on complex scenes until PS4-N6-X3.

While I am sure PS3-N5-X2 will be capable of limited raytracing in simple to modest scenes, there wouldn't be enough processing power for CGI level models let alone environment, gameplay, etc., PLUS raytracing. the upcoming consoles will have their hands full, and so will programmers, just trying to give us the complexity we all want. raytracing would be a real strain on these machines.

I think after this upcoming generation, console makers who care about graphics (Sony, MS) will realize if they do not already, that the only way to make a large leap beyond PS3-X2, that the mainstream consumer will notice, is to have things like raytracing and all the cool global illumination stuff. Sony-MS will probably be looking into a combination of hardware raytracing units and software hacks that reduce the computational time for raytracing. just a wild theory here.

ok enough rambling for this post.

Panajev2001a · Jun 2, 2004

Jaws, look at Fig. 6 of the Rendering patent, that would seem to be ( the whole figure ) a Processor Element to be used in the Visualizer:

the Image Memory would be the Image Cache and main Memory would be the Shared DRAM ( e-DRAM ).

The Parallel Rendering Engine might be realized with the APUs + SALPs.

That picture might not show the PE of the Visualizer directly, but it can be stretched to encompass it ( you can think what elements of a Visualizer PE would fit in that schematic ) as the text of the patent seem to go far beyond what the GSCube proposed also in terms of distributed rendering over a network.

It might have started as an off-shoot of the GSCube research and got extended/ideas from it were considered for use in a CELL based platform.

Megadrive1988 · Jun 2, 2004

by the way, since one of this thread's main topics is fillrate, I thought I'd remind everyone of GSCube's fillrate ^__^

16 processor GSCube

37.7GB/s (2.36 Gpixels/s x 16)

http://ps2.ign.com/articles/082/082490p1.html

note that the Graphic Synthesizers are clocked just under 150 Mhz, just in case anyone was wondering where this figure comes from.

16 GSs * 16 pixel engines/pipes * 147 point something Mhz (i didn't figure the exact clocking of the GSs here, I suck at math)

edit: it was right there, under my nose, 147.456MHz

anyway, the point is, the 16 processor GSCube has an untextured fillrate of almost 38 gigapixels. obviously 1 texture (plus bilinear filtering) cuts that in half to 19 gigapixels (18.85 to be near-exact)

edit: naturally, the 64 processor GSCube had an untextured fillrate of about 150 gigapixels, or about 75 gigapixels textured.

And now we have hints that PlayStation 3's GPU might be producing tens of billions of pixels/sec. so it seems to me that PS3 might, at least, rival the 16 processor GSCube in terms of raw fillrate, if not surpass it. something that would not be true with the previous Visualizer estimates of under 10 gigapixels (from 4 pipes, 1 per PE * 1-2 Ghz) And of course, much more work will be done on PS3 pixels, than were done on GSCube pixels, thanks to the APUs in Visualizer, that GS did not have. but my post here was really simply about raw pixel and textured pixel fillrate.

now that PS3's fillrate at least *seems* to be falling into place nicely, we can worry about RAM memory

Guden Oden · Jun 2, 2004

Megadrive1988 said:
And now we have hints that PlayStation 3's GPU will be producing tens of billions of pixels/sec.

If you'd replace "will" with "might" (or equivalent), I'd feel a lot more comfortable.

Anyway, I think it's pretty out of the question to have a GPU doing tens of gpix/s. It's completely idiotic just to have ten gpix fillrate because we don't NEED that many pixels filled to begin with in a console. Much less TENS of gpixes...

Let's just say I would be highly surprised if this was to be true. I'd be happy with a part doing 2x PS2 untextured fillrate, but with textures, providing it has powerful pixel shading capabilities as well. THAT is where the bottleneck will be, having a supadupa rasterizer that can draw hundreds of polygon layers per frame that will never be seen is just uselessly burning power for no reason.

ERP · Jun 2, 2004

And now we have hints that PlayStation 3's GPU will be producing tens of billions of pixels/sec.

It's quotes like this that really scare me when talking about PS3....

Why would you need 10's of gigapixels?
Does this implay a GS like solution where you store all your intermediate calculations in the frame buffer?

FWIW I really hope not!

Megadrive1988 · Jun 2, 2004

Why would you need 10's of gigapixels?

Well, I am not saying we need tens of billions of pixels, I am saying that the patent seems to hint at PS3's GPU having that level of fillrate.

If you'd replace "will" with "might" (or equivalent), I'd feel a lot more comfortable.

done 8)

edit: GSCube 16 did have tens of billions of untextured pixels and just shy of twenty billion textured pixels. GSCube 64 had a hundred and fifty billion untextured and seventy five billion textured pixels 8)

SGI's Ultimate Vision systems are meant to have upto 40 or 80 billion pixels (very high quality pixels i might add) and UV's purpose is for realtime applications.

so there are uses for that much fillrate. otherwise GSCube and SGI Ultimate Vision would not have been made. I don't think there is such a thing as overkill, even for games.

yes, I do realize that what is done to each pixel might matter more than just how many pixels we can get.

j^aws · Jun 2, 2004

Panajev2001a said:
Jaws, look at Fig. 6 of the Rendering patent, that would seem to be ( the whole figure ) a Processor Element to be used in the Visualizer:

the Image Memory would be the Image Cache and main Memory would be the Shared DRAM ( e-DRAM ).

The Parallel Rendering Engine might be realized with the APUs + SALPs.

That picture might not show the PE of the Visualizer directly, but it can be stretched to encompass it ( you can think what elements of a Visualizer PE would fit in that schematic ) as the text of the patent seem to go far beyond what the GSCube proposed also in terms of distributed rendering over a network.

It might have started as an off-shoot of the GSCube research and got extended/ideas from it were considered for use in a CELL based platform.

I suppose it could be streatched to fit the 4 VS GPU but I still find it odd that the GPU is hooked off the BE in Diag A but off the main buss in Diag C. Something just doesn't add up there? :?

All things equal with the BE and VS, which bus layout seems the most efficient, Diag A or Diag C?

Panajev2001a said:
Texture fetches create a dependency issue for the rest of the Shader ops and the APU will stall for the cycles taken by the texture fetch to come back... I do not see them set-up for OOOe nor automatic CMT ( switch on event MT ).

How slow would it be for the APU to DMA the current context ( Stack and PC ) back to shared DRAM ( we only have 128 KB of Ls per APU ) and start processing on a new pixel ?

I guess that context switching would not be needed shading pixels from the same primitive: the problem if primitives start descending in the 1-4 pixels range in terms of area then we have to make sure each APU is processing multiple primitives in parallel.

How would that work ? That is an area I am trying to think about.

On the bright side, if we looked at the hypotetical VS with 4 PEs and 32 nice little APUs we could see that we have potentially quite abit of pixels in flight at the same time and that with very long Shaders adding relatively few cycles due to texture fetch latency will not impact the situation too much as we are using much more Math ops than Texture fetches and we should have a quite high number of non dependent ( to Texture fetches ) Math ops that can run at full speed on the APU.

It is just the freak in me that wants the APUs to run as close to peak performance as possible.

I wouldn't worry too much.

I thought devs wouldn't need to touch the metal of PS3? This would be the nightmare task of the STI engineers developing the compiler/ CELL OS wouldn't it?

That is one of my biggest concerns where it wouldn't be mature a launch, i.e. inefficient/ buggy...and the final output displayed onscreen say, wouldn't be any different from the earlier released XBOX2 with perhaps a more mature XNA env.?

Megadrive1988 said:
Quote:
Why would you need 10's of gigapixels?

Well, I am not saying we need tens of billions of pixels, I am saying that the patent hints at PS3's GPU having that level of fillrate.

Quote:
If you'd replace "will" with "might" (or equivalent), I'd feel a lot more comfortable.

done

I think the same 1998 argument applies, if PS2 was to have 2.4 Gpix/s, a WTF would be applied!

If six years later in 2004, simple Moores law should take us above 30 Gpix/s otherwise I'd sack my R&D team!

All those billions of Yens...

On the subject of what would we'd do with 10's of Gpix/sec, since the entire GPU is programmable, including the pixel engine, would we not be able to implemet a Reyes pipeline? or other exotic delights.

Megadrive1988 · Jun 2, 2004

thanks Jaws - see I knew I wasn't off my rocker 8)

ERP · Jun 2, 2004

I think the same 1998 argument applies, if PS2 was to have 2.4 Gpix/s, a WTF would be applied! If six years later in 2004, simple Moores law should take us above 30 Gpix/s otherwise I'd sack my R&D team! All those billions of Yens...

On the subject of what would we'd do with 10's of Gpix/sec, since the entire GPU is programmable, including the pixel engine, would we not be able to implemet a Reyes pipeline? or other exotic delights.

Allright you missed my point......

My point i what am I trading for those 10's of billions of pixels, increasing the die area to increase fillrate, means that die area can't be used for say more ALU blocks. Or better texture filtering etc etc.....

How useful is 10 billion pixels per second if your entirely limited by ALU speed. The statement is more about balance that it is about fillrate.

My concern about PS3 in general is exactly what Sony will leave off the die for cost reasons. I have to assume we'll get decent texture filtering that works this time, I have to assume we'll have a complete set of blending ops, but I do worry that they might decide on a "novel" architecture and then have someone that doesn't understand it cut significant features for cost reasons...... IMO this is what happened to the GS.....

When it comes to system performance the devil is in the details and it's the details of what I'm not seeing or hearing that worry me. We'll know soon enough and it's not like I have any control over it so........

Can this SONY patent be the PixelEngine in the PS3's GPU?

Panajev2001a

Panajev2001a

Laa-Yosh

I can has custom title?

London Geezer

Guden Oden

Senior Member

Laa-Yosh

I can has custom title?

qwerty2000

London Geezer

j^aws

qwerty2000

nAo

Nutella Nutellae

Megadrive1988

Panajev2001a

Megadrive1988

Guden Oden

Senior Member

ERP

Megadrive1988

j^aws

Megadrive1988

ERP

Similar threads