"Turn every pixel into a polgyon"-renderer

Hellbinder[CE said:
]At least as far as hardware design goes. The PS2 design is one of the most in efficient, poorly concieved designes of all time.

Don't confuse complicated or hard to program for with an ill-conceived design. It's all about tradeoffs, the ps2 is a very flexible machine at the cost of added difficulty. It probably has the most room for growth of any of the consoles on the market. Just look at the PS2's library today, it's 'poor' design is not exactly lagging behind the other machines technology-wise even though it was the first one out by far.

As someone pointed out before, the PS2 is far outselling the other consoles on the market, which means no matter how hard the ps2 is to program for, the developers will do it.

Hellbinder[CE said:
]Sony is steadily building a reputation of having their Heads shoved all the way up their A$$. Like far.. past the neckline...

Hellbinder[CE said:
]I predict that Sony has already forgotten the PS2's first couple years and is now going to launch the single most ill concieved, hard to develope for Aliased to Beyond hell mess anyone has ever seen. The writting is on the wall. There is no need at all for these guys to be taking the steps they are. How is it that they have not learned any lessons at all?

Hellbinder[CE said:
]I simply do not understand what is wrong with those guys at Sony. Dont they even talk to their developers fisrt? dont they care? Do they want every developer on earth to reinvent the wheel yet again? Why? It makes no sense at all.

I see your...overzealous.. attitude extends to consoles as well.
 
Rancidm said:
As someone pointed out before, the PS2 is far outselling the other consoles on the market, which means no matter how hard the ps2 is to program for, the developers will do it.

Well, to be fair, "has to do it" does not mean "do it well." All projects have their schedule, and you can't really invest too much time on a single game, regardless the market share of its platform.

Of course CELL looks interesting. However, the key point here is development tools. You'll need some new development tool (maybe even new languages or extensions to existing languages) for it. That's the interesting part.
 
Per-pixel shading is not a very good idea really... A pixel is not the unit you want to be shading at, in general its either too often or not often enough (you don't need per-pixel shading for shadow volumes and you want sub-pixel shading in procedural shaders). If you want true displacement mapping as well, per-pixel shading starts looking even worst.

If you have the processing power, a micro-polygon (Reyes) architecture starts looking very tasty. The hardware "polygons" are very simple, basically linear interpolated colour and depth testing. A higher level takes real polygons and vertices and breaks them into micro-polygons based on the shader program (simple point sample texturing would use one micro-polygon per texel). This allows for easy per-pixel displacement (just move the micro-polygons) and very simple rasterisor hardware (so you get bucket loads of fill-rate).

Of course I have no idea if Sony are going down this route but if they can supply the processing power and the correct hardware it would look lovely. All you really need is a very fast rasterisor, fast general texture lookup and lots and lots of general CPU power (to run the shader to micro-polygon converters).
 
Dio said:
I'll believe HDTV when I see it. Because Euro TV standards are so much better than the NTSC it's a non-issue over here - there's no interest and no hardware.

Well I would like to see 1024 x 576p x 60fps (PAL60) on an console. 1024 x 576 = native 16:9 on PAL TV's
Most widescreen consumer TV's should be able to support this; without the progressive output and some should even be able to support it with progressive output. This would make it even possible to surf the net with an decent resolution.
 
DeanoC said:
Per-pixel shading is not a very good idea really... A pixel is not the unit you want to be shading at, in general its either too often or not often enough (you don't need per-pixel shading for shadow volumes and you want sub-pixel shading in procedural shaders). If you want true displacement mapping as well, per-pixel shading starts looking even worst.

If you have the processing power, a micro-polygon (Reyes) architecture starts looking very tasty. The hardware "polygons" are very simple, basically linear interpolated colour and depth testing. A higher level takes real polygons and vertices and breaks them into micro-polygons based on the shader program (simple point sample texturing would use one micro-polygon per texel). This allows for easy per-pixel displacement (just move the micro-polygons) and very simple rasterisor hardware (so you get bucket loads of fill-rate).

Of course I have no idea if Sony are going down this route but if they can supply the processing power and the correct hardware it would look lovely. All you really need is a very fast rasterisor, fast general texture lookup and lots and lots of general CPU power (to run the shader to micro-polygon converters).

That is interesting, but it seems that it would eat quite a lot of CPU resources as T&L will be REALLY intensive... What would you say about ~1 TFLOPS and ~1 TOPS ( Integer ) ?

What can you say regarding your idea and the processor described in this patent ?

http://makeashorterlink.com/?B4DB23903

Basically We could also run pixel programs as the Visualizer chip described in the patent and in one of the images attached to it would be programmable and if we wanted it to operate with pixels I think it could do it... of course it could also work as the simple and fast Rasterizer you are talking about with the Broadband Engine ( as seen in the patent ) should have enough power to dynamically tesselate visible surfaces ( hey we can do deferred T&L... sort the HOSs' control points and tessellate only the visible patches... ) to micro-polygons and light them...

Out of 1 TFLOPS ( theoretical max ) how much do you think would be left for Physics and other FP intensive game code ? ( and if this takes also a hit on the ALUs, how much would it be ? )

Very quick spec sheet for the Broadband Engine and the Visualizer ( as described in the patent ):

Broadband Engine:

4 PEs:

Each PE has:
8 APUs and 1 PU

PU: RISC processor ( it could be a compact PowerPC derivative )

APU: 4 FP Units, 4 Integer Units ( correct me if I am wrong, but the 4 FP Units and the 4 Integer Units can work in SIMD mode and each can deliver a 128 bits result per cycle [Fused Multiply-Add, FP and Integer], someone pointed out that each FP Units is a SIMD VU, but that seems contraddicting the patent, the drawings attached to the patent and common sense ), 128 KB of Local Storage ( SRAM ) and thirty-two 128 bits registers ( shared between the 4 FP Units and the 4 Integer Units ).

64 MB of e-DRAM


Visualizer:

4 PEs: each PE has 4 APUs, 1 PU and 1 Pixel Engine + Image Cache + CRTC ( the Pixel Engine, Image Cache and CRTC replace the 4 APUs you normally find in the PE... this should be helpful for manufacturing as the Visualizer could be a "slightly" modified Broadband Engine and they would share lots of functional blocks... ).

Unspecified amount of e-DRAM: I suspect 64 MB of e-DRAM ( it would make it easier to manufacture as we could use the same manufacturing lines for both Broadband Engine and Visualizer... ).

FP performance of 1 APU ( I suspect that this result is considered for the Broadband Engine and that the Visualizer would ship at a lower clock-speed ) would be 32 GFLOPS which assumes a 4 GHz clock-speed ( the e-DRAM doesn't have to run at that clock-speed ) for the APU ( target process should be 65 nm [65 nm WILL be ready by mid 2004... Toshiba's engineers which co-developed this process with Sony, and with ideas from the 100 nm SOI IBM technology which Sony licensed, affirmed that they are confident to have fabs ready for early production by March 2004 in time to start speeding up and get the fabs ready to mass-manufacture Cell chips for a mid 2005 launch in Japan with the North American launch following few months later...] with 45 nm to follow as soon as it is ready for mass manufacturing... )...
 
Hellbinder[CE said:
]At least as far as hardware design goes. The PS2 design is one of the most in efficient, poorly concieved designes of all time.

I can't help buy see the irony in this statement as the PS2's (launched in 2000) equivalent of Vertex Shaders are quite a bit beyond those in even the R300 (2002). Perhaps in DX10 they will finally catch the VU's when it comes to things like Vertex creation/destruction.

It's not poorly conceived, just an architecture from 2000 that in a few respects was ahead of it's time. You don't need to have a PC-centric architecture, it's not the "only way" to skin the proverbial cat. Especially when your using all custom parts.

Also, Panajev - this style of 3D architecture has been talked of on several occasions by SCE engineers around the time of PS2. Infact, I think it was "Renderman in a chip" or something to that effect (Renderman being the obvious REYES renderer). But, this would also fall inline with the early PS3 based SCE internal R&D projects that utilized the GSCube to show the feasibility of using a concurrency of simple rasterizers with a front-end thats heavy in FP power.
 
As DeanoC said this is not a new idea. Pixar uses the REYES (Renders Everything You Ever Saw), a micropolygon based architecture. The first stage of the rendering pipeline transforms and splits complex high level primitives, basically NURBS and subdivision surfaces, to smaller ones. It culls the out of screen parts and continues the splitting until the primitive is simple enough, so it dices it to a grid of micropolygons. Micropolygons are approximately half a pixel on a side in screen space (this corresponds to the famous Nyquist limit). Then the algorithm shades the micropolygon grids in SIMD fashion. In the final stage each grid (until now a grid is just a group of vertices) is converted to micropolygons and stochastic sampling is performed for the on-screen micropolygons.

Just a few remarks:
- The algorithm is not T&L bound. Transformations are performed before dicing and transforming control points (polygons are used very rarely for many reasons) is not expensive. The algorithm is bound by the dicing and shading stage.
- Shading micropolygons is not more expensive than shading pixels
- Using micropolygons is the only “practicalâ€￾ way (that we know) to do true (subpixel) displacement mapping (and you really need subpixel precision to avoid aliasing).
- The stochastic sampling in the final stage offers high quality antialiasing, motion blur and depth of field, with very little cost.
- Keep in mind that when you have high quality motion blur you don’t need more than 30-40 fps (unless you are thinking that dvd movies are choppy). Also consider that we can already render at around 0.1 fps using an AthlonXp at 1.675MHz (~1.6 GFlops, since PRman is not using any SSE optimizations), so I don’t think we need 1 Teraflop.
- It’s scalable and fits well in the cell(PS3) architecture.

I think now its apparent that it’s a very practical architecture when the common case is motion blurred, displaced mapped geometry with very high quality antialiasing and depth of field. It seems that the real time world is moving towards a REYES like architecture and at the same time the RenderMan world is moving slowly towards a raytracing-global illumination architecture.
And of course this is just my personal, biased opinion.
 
I can't help buy see the irony in this statement as the PS2's (launched in 2000) equivalent of Vertex Shaders are quite a bit beyond those in even the R300 (2002). Perhaps in DX10 they will finally catch the VU's when it comes to things like Vertex creation/destruction.
I dont think so. The Ps2 vertex creation/destruction is no where near the R300 core. I dont even understand where you are comming from with this. That is even talking theoretical maximums. Real world performance is considerably less than that due to the insane ballancing act you have to do all teh time just to avoid endless pipeline stalls etc...

Dx10.. please... :rolleyes:
 
Pavlos,

what resolution is the 0.1 fps for? Can you describe the scene you achieved this on? (actually, links to PRMan "benches" would be even better).

Thanks,
Serge
 
Hellbinder[CE said:
]
I can't help buy see the irony in this statement as the PS2's (launched in 2000) equivalent of Vertex Shaders are quite a bit beyond those in even the R300 (2002). Perhaps in DX10 they will finally catch the VU's when it comes to things like Vertex creation/destruction.
I dont think so. The Ps2 vertex creation/destruction is no where near the R300 core. I dont even understand where you are comming from with this. That is even talking theoretical maximums. Real world performance is considerably less than that due to the insane ballancing act you have to do all teh time just to avoid endless pipeline stalls etc...

Dx10.. please... :rolleyes:

You want to play performance ?

[church lady] let's do some math, shall we ?[/church lady]

EE has in TOTAL ~13 Million Transistors and runs at 300 MHz...

The R300's Vertex Shaders/T&L core is ? And How many transistors does it use ?

However I agree... Vertex Creation is splendid in the R300, no developer seems to be using their Displacement Mapping unit and you are right, the R300 Vertex Shaders' creation/destruction capabilities seems far from the EE's VUs... the R300 has to use the host CPU as Vertex Shaders do not create or delete vertices... and Vince hasn't been the only one saying that EE's VUs are better than current VS...
 
Pavlos said:
As DeanoC said this is not a new idea. Pixar uses the REYES (Renders Everything You Ever Saw), a micropolygon based architecture. The first stage of the rendering pipeline transforms and splits complex high level primitives, basically NURBS and subdivision surfaces, to smaller ones. It culls the out of screen parts and continues the splitting until the primitive is simple enough, so it dices it to a grid of micropolygons. Micropolygons are approximately half a pixel on a side in screen space (this corresponds to the famous Nyquist limit). Then the algorithm shades the micropolygon grids in SIMD fashion. In the final stage each grid (until now a grid is just a group of vertices) is converted to micropolygons and stochastic sampling is performed for the on-screen micropolygons.

Just a few remarks:
- The algorithm is not T&L bound. Transformations are performed before dicing and transforming control points (polygons are used very rarely for many reasons) is not expensive. The algorithm is bound by the dicing and shading stage.
- Shading micropolygons is not more expensive than shading pixels
- Using micropolygons is the only “practicalâ€￾ way (that we know) to do true (subpixel) displacement mapping (and you really need subpixel precision to avoid aliasing).
- The stochastic sampling in the final stage offers high quality antialiasing, motion blur and depth of field, with very little cost.
- Keep in mind that when you have high quality motion blur you don’t need more than 30-40 fps (unless you are thinking that dvd movies are choppy). Also consider that we can already render at around 0.1 fps using an AthlonXp at 1.675MHz (~1.6 GFlops, since PRman is not using any SSE optimizations), so I don’t think we need 1 Teraflop.
- It’s scalable and fits well in the cell(PS3) architecture.

I think now its apparent that it’s a very practical architecture when the common case is motion blurred, displaced mapped geometry with very high quality antialiasing and depth of field. It seems that the real time world is moving towards a REYES like architecture and at the same time the RenderMan world is moving slowly towards a raytracing-global illumination architecture.
And of course this is just my personal, biased opinion.

It appears I was wrong on the T&L issue, I wasn't thinking about T&L only the control points and then create the micro-polygons ( which is laughable as I think I even mention that as a technique to do deferred T&L :LOL: )

That is interesting... but would not we want to see global illumination as well ?

What about using photon mapping ( mix of pre-generated photon maps and real-time photon mapping ) ?

Could we use the cheaper-than-raytracing photon mapping with this REYES like architecture ?

I found a paper in which the author ( a CS student, it was his thesis IIRC ) presented a photon mapping implementation running at 40 fps on a Pentium III 866 MHz... of course the scene was quite simple and the number of photons shot was not impressively high, but Cell appears to be ahem... faster, quite faster than a Pentium III 866 MHz and SSE ;)

Photon mapping provides several advantages: cheap global lighting ( well cheaper than ray-tracing or radiosity considering what it brings ) caustics, color bleeding, refractions, reflections, etc...

I am interested in this as I want to see definately better lighting ( well in the graphical department )

It sounds very intriguing what you say about REYES allowing high quality motion-blur and making 30 fps games feel smooth as if they were of a higher frame-rate ( and this works with DVD movies... ).

That would allow more resources to be used for rendering... you have double the frame time... :)

Won't that be a problem with progressive scan ? I haven't given it too much of a thought ( it could be a brain fart as they say... ;) ), but my gut feeling reaction is that we can still output 480p and 720p if we use full frames...
 
Panajev2001a said:
R300 Vertex Shaders' creation/destruction capabilities seems far from the EE's VUs... the R300 has to use the host CPU as Vertex Shaders do not create or delete vertices... and Vince hasn't been the only one saying that EE's VUs are better than current VS...

Yes, I thought that current implimentations of VS unde DX9 lack the ability to create or destroy vertices in the Shader itself.

Hellbinder[CE said:
]I dont think so. The Ps2 vertex creation/destruction is no where near the R300 core

The R300 can't do either in it's Vertex Shader IIRC, correct me if I'm wrong. So, of course the PS2 is nowhere near thr R300 core as the R300 ore can't do it under DX.. it uses the host CPU.

Did you mis interpret what I wrote and you think I ment sheer vertex creation, as in triangle/ploygon count? Because I didn't intend that, and I'm sorry if I gave that impression.
 
- Keep in mind that when you have high quality motion blur you don’t need more than 30-40 fps (unless you are thinking that dvd movies are choppy).

This depend, games generally has the cameras and actions faster than movies. Movies has the ability to cut from different camera angle and when there is panning or zooming its done slowly.

That said, action packs DVD movies can use higher frame rate too.
 
erhm.. that's very nice. but I think I've seen this one before.

wasn't the PS2 supposed to be so powerful that it could do everything with polygons and texturing would no longer be required?
 
eh the proof is in the pudding guys...all the PS2 games I've played (most of them) have relatively blurry textures, terrible aliasing, very primitive lighting models (vertex lighting for the entire world, ugh) and while the polygon counts are "good" they're not at all "mind boggling". now they might be really fun games (they are) but their's obviously some technical shortcomings with the way PS2 was made.

(laymans opinion here) it's true that REYES is awesome, but that's not real-time rendering. for that to work in real-time (which would be really cool) you need an absolutely insane amount of power/bandwidth/memory/resources. in my eyes that was the "problem" with PS2. it lacked the resources to pull that kind of thing off in real-time. the question should be will PS3 reach that "critical mass" of power, or will it still be "ahead of its time" like the PS2 was/is.
 
Pavlos said:
Also consider that we can already render at around 0.1 fps using an AthlonXp at 1.675MHz (~1.6 GFlops, since PRman is not using any SSE optimizations), so I don’t think we need 1 Teraflop.

The Athlon can dual issue x87 so performance is more likely to be around 3GFlops, which makes 1TFlops sound about right for a 25-30 fps target.

However, I still very much doubt the 1TFlops number.

As Panajev mentioned above it will require 128 FMAC units (organized as 32 4-way SIMD units) running @ 4Ghz. This will make the PS 3 double up as a space heater.

Cheers
Gubbi
 
Back
Top