If PS3 can really do 1Tflops

Fafalada · Mar 7, 2003

Of course you must include sampling and filtering.

Well, you said "shader" comparison at first...

Anyway, I'm not arguing that a generic CPU could perform well in software rendering - I'm arguing that the reason why they can't perform well has little to do with their flops rating, and a lot to do with how memory accesses are handled in CPU as opposed to a GPU.

T&L benchmarks, (which are "less" memory constrained then pixel sampling ops) show exactly that too - longer the shader, more efficient CPUs are at it.

BenSkywalker · Mar 7, 2003

I'm arguing that the reason why they can't perform well has little to do with their flops rating, and a lot to do with how memory accesses are handled in CPU as opposed to a GPU.

Which is the reason that CPUs won't be comparable with GPUs for a couple of decades

T&L benchmarks, (which are "less" memory constrained then pixel sampling ops) show exactly that too - longer the shader, more efficient CPUs are at it.

Do you have examples of that? I haven't seen any software rasterizers that can handle a decently complex shader effect(say ~100 instructions) at anything close to GPU levels of performance. Then again, I'm mainly stuck comparing 'off line' render engines to DX9 and looking at the end results. When rendering basic ops I can get close to 1% comparing CPU to GPU, add shaders and it drops off considerably(.001% is closer). I'm sure aiming at a fixed platform, using assembly optimizations and focusing on a particular shader op you could do much better, but you honestly think you could get close to a GPU? SGI, Discreet and Pixar haven't even come close yet.

Vince · Mar 7, 2003

BenSkywalker said:
Numerous elements, some I have a direct response to below, but for obvious ones, reading back data from the FB and having to reprocess/re rasterize.

Actually, you listed none AFAIK; the questions till stands. Infact, it's {Vertex Shading] nothing diffrent than whats done today on the PS2, or a PC with a TNT2 generation rasterizer. Or it's [fragment] nothing diffrent that what will eventally happen post DX10 or whenever they combine the processing resources in the 3D pipe. How is this impossible?

Memory costs would kill you

Last time I checked we were talking processing, not memory... nice try buddy. You're points are pretty unfounded IMHO; especially on the topic of RT Raytracing in which a cellular processor would utterly destroy you doing it in a DX10 shader.

Nobody is talking about software rasteization. In case you don't get it and untill someone proves or explains diffrent - A design like Cell isn't that much diffrent than a P10 or other advanced architecture.

The P10 is quite slow compared to its contemporaries, not to mention expensive and not much more flexible(at least compared to the NV30).

Ben, come on dude:

(a) The P10 isn't a gaming card
(b) The P10 is fully SIMD; we're talking about having dedicated hardware do the things like sampling and filtering which just eat processing resources, but have significant programmable VU-like processors that are a parallel to the NV30 and R300s Fragment/Pixel and Vertex Shaders. Especially in the case of the NV30's Vertex/TCL front-end - but with time VS and PS will merge anyways.

Of course you must include sampling and filtering. I'm talking about what is possible with 1TFLOP of general purpose CPU power.

Ben, AFAIK (maybe Faf is in which case I appologize), but nobody here is talking full software rasterization. And nobody is talking about a CPU architecture like that found in x86. When will you get this bud?

I'll continue in another post.

Vince · Mar 7, 2003

BenSkywalker said:
Hmm, 100,000 P4s @3GHZ would be 2.4PFLOPS, that could likely do very nicely for real time software rendering

And 1,000 Cell-like clustered processors would eat it alive between it's onboard memory and superior design thats suited to scientific processing.

You keep thinking of Cell interms of x86 processors like the Xeon, P4, or Athlon. What's a bit closer (more or less in the same 'family') is how about you take an Emotion Engine, clock it at 4Ghz, multiple by 16 to get the same amount of FMADs; on-top of that add in the pool of eDRAM, the APU cache's and the effeciency of it being a new effecient design using IBM's superior layout tools, et al

A TFLOP is nothing big for a render farm, I would expect Pixar to be closer to PFLOP territory(I'm not sure on that, but I would expect it).

Dude, you're dreaming or been sniffin' the Sharpie too much; as zidane1strife implied:

The ranking Supercomputer, NEC's Earth Simulator, peaks out at 40TFlop and can sustain 35TFlop - using dedicated vector processors.

After that is is the Dept of Energy's ASCI series which peak around 10TFlops and does south from there - uses off-the-shelf components.

So, in theory you'll get theoretically near 2002 super-computer power at a millionth the cost in electric, it's a fraction of the size and will be in a console.

Whats even more impressive is that thanks to the advances in lithography, others will pass this by... and a cluster the size of ASCI series - 4096 processors - will have massive impacts on the scientific community. I can't wait to see how molecular medicine uses this. The future will be good indeed

Saem · Mar 7, 2003

The P10 is fully SIMD; we're talking about having dedicated hardware do the things like sampling and filtering which just eat processing resources, but have significant programmable VU-like processors that are a parallel to the NV30 and R300s Fragment/Pixel and Vertex Shaders. Especially in the case of the NV30's Vertex/TCL front-end - but with time VS and PS will merge anyways.

This is a little hard to decrypt. The P10 is using scalar processors, not much SIMD going on there. As for flexibility, are you saying the P10 is more or less flexible than the NV30 and R300? I'm guessing more, but your wording isn't helping.

Slipstream floating point everything or rather nearly everything, more execution units and a faster clock would so kick butt. If only the next Px would have all that and come out really soon.

[/quote]

zidane1strife · Mar 7, 2003

When rendering basic ops I can get close to 1% comparing CPU to GPU, add shaders and it drops off considerably(.001% is closer).

Hmm, 100,000 P4s @3GHZ would be 2.4PFLOPS, that could likely do very nicely for real time software rendering

Well if you're right and general purpose cpus get just 1% of what true h/w designed for 3d gphx can get... then those are rendering what a gpuish esque arquitecture would do with 24TFLOPS...

I believe the final cell processor could reach a TFLOPS rating beyond the speculated, and officially announced and maintained number, 6+TFLOPS, I believe it could reach sligthly above 10TFLOPS.... 10+TFLOPS on h/w designed for handling gphx... which would be comparable if that 1%number is true, to 1PFLOPS of General cpu power.

PS yes, I'm optimistic, maybe TOO optimistic, but I just can't help it.

EDITED

Panajev2001a · Mar 7, 2003

Ben, the funny thing is that you invented your own argument and set up our own assumptions and restrictions that declared you winner...

Who in this thread was taking 1 TFLOPS and thinking the WHOLE 3D pipeline was going to be implemented in software ( including sampling, etc... )...

Basically whenever somebody built the case that as far as T&L and Fragment Shading 1+ TFLOPS ( 1-1.25 TFLOPS ) is a nice achievement you kept going back to the "but I was just saying it in regard to FULL software rendering"... and that is not the point of the thread...

You admitted yourself that you expect Sony to provide a ( decent ) rasterizer ( which also includes Cell technology as the patent indicated ) and this makes the reasons why you started the argument even less clear to me...

You say that we should not be amazed by 1 TFLOPS because a PURE software renderer would not be competitive in 2005 even with 1 TFLOPS and then you say that it doesn't really matter as PS3 will not really go on a purely software route and there will be a rasterizer with 3D dedicated silicon... :?

MfA · Mar 7, 2003

randycat99 said:
Without a doubt, that will be a paramount issue for all next generation consoles. It's very easy to throw together half a dozen effects indiscriminantly to proof a hardware design. It's not easy to build an intricate artistic depiction that looks integrated, deliberate, and inherently artistically motivated, even if the hardware that could effortlessly pull it off is sitting right in front of you. Aside from whether or not the talent is present, will the time and budget be present?

True true, but that is not why I mentioned squeezing the artist into the console. Shots are directed in offline rendering, you cant do that in an interactive application. Also rendering hacks are chosen to fit the scene and vice versa ... that can still be done to some extent in interactive applications, but to a far lesser degree.

On the whole rendering has to be more realistic/complex in interactive rendering than in offline rendering (for the same quality). Artists spend a lot of "FLOPS" to make each shot and even each frame look good, which cant be caught in the simple equation of the amount of cycles spend by the cluster ... and they can cover up rendering artifacts, some by anticipating them and some they notice afterwards, without the artists your only option is getting it right in one and trying to predict how rendering will look from each possible viewpoint instead of just the preselected ones used in a movie.

Fafalada · Mar 7, 2003

Do you have examples of that? I haven't seen any software rasterizers that can handle a decently complex shader effect(say ~100 instructions) at anything close to GPU levels of performance.

I was talking about geometry benchmarks.
In simple transforms GPUs would always dominate, but loose tons of ground in more complex shader/lighting situations where actual calculations, as opposed to memory accesses, become the dominant factor. Actually from what I remember high end cpus typically even outperformed GPUs in those situations.
You could even see that pattern in PS2 launch performance comparison graphs released by Sony...

I'm sure aiming at a fixed platform, using assembly optimizations and focusing on a particular shader op you could do much better, but you honestly think you could get close to a GPU?

Of course not, I was never arguing that.

marconelly! · Mar 7, 2003

One thing I'm not completely certain about:

Why compare off-line renderers' algorithms and assume those same algorithms would be used for CPU-only realtime rendering? Offline renderers are calculating everything with enormous precision - more than likely what they are doing can be reasonably approximated with simpler algorithms and speed up the proces significantly?

Another thing is, I'm not sure anymore the Cell type processor can even be considered as a 'general purpose CPU'...

randycat99 · Mar 7, 2003

It certainly is a grand departure from a generic x86 CPU implementation sitting on your desk. I think that's where the bulk of these "software renderers can never be fast" arguments crumble to the ground.

Gubbi · Mar 7, 2003

BenSkywalker said:
1024 * 2.8 GHz * 4FLOP/cycle ~ 11.5 TFlops

Click to expand...

Looking at peak FLOPS ratings you should use the theoretical 8 FLOPS per clock of the Xeon, not the 4 of the P3. Still a lot less then I expected, only ~23TFLOPS. Surprising to me that they don't drop a bit more cash on renderfarms.

The P4 can only issue one FP instruction per cycle (well two, but one has to be a move/load/store), with SSE that becomes 4 FLOPs/cycle.

Cheers
Gubbi

MfA · Mar 7, 2003

marconelly! said:
Why compare off-line renderers' algorithms and assume those same algorithms would be used for CPU-only realtime rendering?

Dunno about others, but I quite explicitly wasnt ... although my position is diametrically opposed to yours. IMO you need to simulate reality closer to get the same quality in interactive rendering ... although the smaller number of pixels is nice.

Offline renderers are calculating everything with enormous precision - more than likely what they are doing can be reasonably approximated with simpler algorithms and speed up the proces significantly?

That is what we are doing now ... we are talking about matching the quality of offline rendering. If there were shortcuts they could take which would still let them achieve the same quality, the people doing offline rendering would be using them already.

randycat99 · Mar 7, 2003

MfA said:
If there were shortcuts they could take which would still let them achieve the same quality, the people doing offline rendering would be using them already.

...or maybe they are using what has already been established as tried and true, and generally uncompromised as far as intended quality. If offline rendering is the target, that certainly puts time-savings via alternative techniques at a lesser priority.

Vince · Mar 7, 2003

MfA said:
diametrically opposed

Ahh!! Thanks, this phrase has been on the tip of my tongue for over a week now - I keep being led back to polar, when I was thinking diametric.

That is what we are doing now ... we are talking about matching the quality of offline rendering. If there were shortcuts they could take which would still let them achieve the same quality, the people doing offline rendering would be using them already.

While I understand what you're saying (and your right), I think what he's trying to say is that in the real, real-time, world - you don't need 16-tap stochastic multisampling, then supersample the result to provide an image which is good enough.

The P10 is using scalar processors, not much SIMD going on there.

You're right, thanks for the correction. That chart with the progression of GPU/CPU architectures popped into my head and I remembered the part labeled 100% SIMD distinctly... bummer.

jvd · Mar 7, 2003

randycat99 said:
MfA said:

If there were shortcuts they could take which would still let them achieve the same quality, the people doing offline rendering would be using them already.

Click to expand...

...or maybe they are using what has already been established as tried and true, and generally uncompromised as far as intended quality. If offline rendering is the target, that certainly puts time-savings via alternative techniques at a lesser priority.

I dunno. I've taken many shortcuts in my life and have found that while offering results close to the real way its never quite the same. I feel that if realtime rendering increases 10fold this year then farm rendering would increase 10fold thus there is no way for realtime to ever catch up.

Tv quality isn't going to be around much longer at 640x480. The bar will be raised. Who in gods name will buy an hdtv and a ps4 and then want it to be played at ntsc res ? I know i wouldn't. I'm pissed if i can't play a game at 1600x1200 with 2fsaa and 16tap aniso on my pc . Thats another thing that will come into play. Fsaa. render stations can use any form and magnitute of fsaa they want . Realtime rendering will be handy capped for that. Realtime rendering is still limited to a fixed amount of lights and allways will be. Sure 20 , 30 light sources would be nice but not compared to the thousands they can do in offline rendering. Because if they really wanted to they can set insane goals for offline rendering and then give the computers as much time as needed to complete it. All the while useing newer and newer tech. Athlon 64s will push the bar again as will the faster ibm chips , intel chips and what have you. Comparing 2003 tech to 2005-6 tech that may not even reach the speeds they are sugesting is funny as hell. Esp when some chips were meant to do a broad set of tasks and some are meant to do a specific thing really fast

randycat99 · Mar 7, 2003

Before you go too far with your train of thought there, you do realize that you are moving the CG "goal posts" in order your bolster your point, right? No one is arguing that realtime will surpass offline CG if you keep moving the offline CG front to whatever is SOTA at the time. I think some can agree that today's offline CG looks pretty damn good (maybe not even today-today, but lets just say FF:TSW movie sort of CG), and that level is plausibly achievable in realtime CG on next generation hardware. That is all. It's a lofty and worthwhile goal, and no honor has been lost in the offline CG camp for having done so.

jvd · Mar 7, 2003

randycat99 said:
Before you go too far with your train of thought there, you do realize that you are moving the CG "goal posts" in order your bolster your point, right? No one is arguing that realtime will surpass offline CG if you keep moving the offline CG front to whatever is SOTA at the time. I think some can agree that today's offline CG looks pretty damn good (maybe not even today-today, but lets just say FF:TSW movie sort of CG), and that level is plausibly achievable in realtime CG on next generation hardware. That is all. It's a lofty and worthwhile goal, and no honor has been lost in the offline CG camp for having done so.

And i still say that your goal of the final fantsy movie is way off. I have yet to see any graphics in game that come close to toy story graphics. Toy story which is how friggen old. You guys are the ones that are moving goal posts. Start with the first cgi movie and then move foward. Untill we surpass Toy story , bugs life , ants , toy story 2 and whatever else there is we can't move on to more modern movies. I believe that the cgi used for fmv in final fantsy 7 on the psx will be able to be done realtime on the ps3. But that is no where close to toy story graphics imho. Lets not even get into gollum from lord of the rings tt. Thats a movie that has been out since last year. I think you guys are expecting a leap way to big. And considering how small the leap from ps1-ps2 was in the grand scheme of things i doubt this next leap will be much bigger in the grand scheme.

MfA · Mar 7, 2003

If you have a 100 subpixel polygons for every pixel you do need "ridiculous" supersampling ratios.

BTW I am fairly certain in a couple of years noone will be using general purpose CPUs as the workhorses for doing offline rendering anymore. x86(-64) might see a couple of days in the sun, non x86 processors are out of the question going forward given that their cost/performance ratio is abysmal, but after that multimedia processors will take over IMO (an out of fashion term, but still a pretty good characterisation for processors like modern GPUs and Cell). Hell the kind of processors in use today will probably go extinct altogther, in as they wont be even used as cores inside parallel processors, the ever increasing transistor counts for cores we have seen over the years will be reversed I tell ya

zidane1strife · Mar 7, 2003

I believe that the cgi used for fmv in final fantsy 7 on the psx will be able to be done realtime on the ps3. But that is no where close to toy story graphics imho. Lets not even get into gollum from lord of the rings tt. Thats a movie that has been out since last year. I think you guys are expecting a leap way to big. And considering how small the leap from ps1-ps2 was in the grand scheme of things i doubt this next leap will be much bigger in the grand scheme.

FF7 FMV?!? are u crazy or something, outside of the enviroments, the char.s and vehicles can easily be rendered on todays h/w. Heck u can actually see the limb connection on their char. models...

If there is a ps2 FF7 remake, I expect they'll use models of higher quality than thosed used in said fmvs.

As for the small jump... The ps2 appears to be more than 100 times the power of the psone. With the ability to render things that are simply impossible on the psone, not to mention some of it's stats have yet to be surpassed even in the pc market.

If PS3 can really do 1Tflops

Similar threads