If PS3 can really do 1Tflops

randycat99 · Mar 7, 2003

You know what I meant.

jvd · Mar 7, 2003

randycat99 said:
jvd said:

Btw 1tflop is alot of speed and power. But 1tflop does not equal 100s of pcs...

Click to expand...

Actually, 1 TFLOP would be about 1 hundred 3 Ghz P4's. If one P4 can run the hell outta Doom 3, I don't think a lot of people will be complaining about a game that utilizes 100x the resources of Doom III will look like. So what is there to be disappointed over?

Mind you (if you hadn't read the first post in this topic), this topic isn't about whether or not you think the PS3 will deliver 1 TFLOPs. It is about what can we expect assuming it does.

Btw next time quote all of what i said "Btw 1tflop is alot of speed and power. But 1tflop does not equal 100s of pcs working on a scene for weeks at a time ." As you can see i'm argueeing how fast 1tflop is . I'm saying taht 1tflop(realtime) doing what 100s of pcs take weeks to do will not happen.

randycat99 · Mar 7, 2003

It entirely depends on what each of those PC's (of the hundred) is capable of. I think it's pretty pointless to make a statement on it as vague as that. Is a game console going to be rendering movie resolutions or TV-ish resolutions? Are there shortcuts that can be exploited in a realtime videogame vs. doing everything "genuine" in a movie render?

Panajev2001a · Mar 7, 2003

jvd... if at 65 nm ( with specs targeted probably for 45-50 nm ) they do not pass 1 GHz it would mean you are getting a BEAST even bigger than the one we are expecting...

I think they will be able to push the clock-speed faster than that...

Sony has quite always delivered specs higher than initial demos of the HW ( EE was demoed at 250 MHz and shipped at 300 MHz... )...

jvd · Mar 7, 2003

Panajev2001a said:
jvd... if at 65 nm ( with specs targeted probably for 45-50 nm ) they do not pass 1 GHz it would mean you are getting a BEAST even bigger than the one we are expecting...

I think they will be able to push the clock-speed faster than that...

Sony has quite always delivered specs higher than initial demos of the HW ( EE was demoed at 250 MHz and shipped at 300 MHz... )...

I was just using an example dude. I still think 3ghz is high . But they should be able to do it. Also look at nvidia they were allways able to push out faster and faster chips using cutting edge tech and then the tech was plagued with problems. Who knows if that wont happen with the 65mn or lower tech. Also how do you know sony wasn't aiming for 400mhz on the ee , demoed alpha tech at 250 and final hardware was only able to hit 300 ? What you see is only half the story .

randycat99 · Mar 7, 2003

My question still remains unanswered from before- is there a renderfarm out there being used to make movie sequences that was actually the size of 100 nodes? Surely, they weren't 3 GHz P4 jobs, either. My guess is that 16 or 32 would be a more likely number, and each unit wasn't sporting the latest clock speeds? If such a system isn't even close to 1 TFLOPs, it won't be directly comparable anyway with regards to rendering movie frames in hours vs. rendering a videogame in realtime. For example, we could be talking about a renderfarm topping out at a "mere" 200 GFLOPs for all we know (not that 200 GFLOPs isn't a vast achievement in its own right).

jvd · Mar 7, 2003

randycat99 said:
It entirely depends on what each of those PC's (of the hundred) is capable of. I think it's pretty pointless to make a statement on it as vague as that. Is a game console going to be rendering movie resolutions or TV-ish resolutions? Are there shortcuts that can be exploited in a realtime videogame vs. doing everything "genuine" in a movie render?

Imagine what those same 100s of pcs can do at tvish res . I'm sure it would still be better than what the 1tflop cpu can do realtime.

Lets say those 100pcs = 1tflop I doubt anyone here would question that 1tflop doing a 1 sec of footage and having a week to render it all would be able to do alot more detail than 1tflop doing 1 sec of footage realtime .

jvd · Mar 7, 2003

randycat99 said:
My question still remains unanswered from before- is there a renderfarm out there being used to make movie sequences that was actually the size of 100 nodes? Surely, they weren't 3 GHz P4 jobs, either. My guess is that 16 or 32 would be a more likely number, and each unit isn't sporting the latest clock speeds?

I believe the lord of rings (rtk) and the hulk special effects are some where close to 75 cpus and since this movie is new and not yet released it should be save to say that they are using 2ghz + cpus and computers that have multi chips in them .

randycat99 · Mar 7, 2003

???

If it's 1 TFLOP of performance either way, why would a 100-node version look substantially different from the monolithic version?

jvd · Mar 7, 2003

randycat99 said:
???

If it's 1 TFLOP of performance either way, why would it be substantially different?

Because . You have one being able to take hours to put as much detail as possible into the rendering . The other not even a second to get it rendered and displayed.

Its like me and you going to rome . You having a day to see everything and myself having a week. Which one of us would get to see more. Its the same thing with the pcs . All things being equal the one with the most time to do its task would be able to add more details. Even if all things are not equal the one with no time restraints would be able to overcome any handycaps it might have had .

randycat99 · Mar 7, 2003

...But you are comparing a rendering implicitly tied to movie resolution and comparing that to another rendering done at TV resolution. There's a lot of demands that will radically decline if the target resolution is lower. Now throw in the factor that you may use a more speedy, realtime ray-tracing procedure in a game than genuine, hi-quality raytracing in a movie render.

So if you have a 1 TFLOP 100-node system and a 1 TFLOP monolithic system rendering to the same resolution in realtime, why would there necessarily be a difference. That was my point.

Panajev2001a · Mar 7, 2003

jvd, I understand your point yet I think guys like the Sony, IBM Toshiba group have access to better manufacturing processes and fabs than nVIDIA (TMSC was the bottleneck...)... according to a very optimistic statement, nVIDIA and its partners ( TMSC ) are approx. 6 months behind the guys like Intel and IBM as far as manufacturing technology goes... this can be a bit more, but still 6 months is a LOT in the electronics world...

I think they are in better situation than TMSC when they started getting problems on .13 um even if the Broadband Engine should be such a large chip...

jvd · Mar 7, 2003

randycat99 said:
...But you are comparing a rendering implicitly tied to movie resolution and comparing that to another rendering done at TV resolution. There's a lot of demands that will radically decline if the target resolution is lower. Now throw in the factor that you may use a more speedy, realtime ray-tracing procedure in a game than genuine, hi-quality raytracing in a movie render.

So if you have a 1 TFLOP 100-node system and a 1 TFLOP monolithic system rendering to the same resolution in realtime, why would there necessarily be a difference. That was my point.

and i never said both would be rendering real time . I said the one that would have weeks to render the scene can add more detail to it. Of course there would be no big dif runing a 100node system and a monolithic in real time. The monolithic would prob be faster.But as with anything the more time you have the more you can actually do. If i wanted to , i can make a scene that would take 3 years to render on a 100 pcs (I'm talking tv res and real artists since i have trouble make stick figures.) and then ask you to do the same scene real time on the ps3 with all its effects and you wouldn't come close . When did toy story come out ? what 97-98 ? have we seen real time graphics that good yet ? I'm sure the ps2 is alot faster than what they rendered toy story on .

jvd · Mar 7, 2003

Panajev2001a said:
jvd, I understand your point yet I think guys like the Sony, IBM Toshiba group have access to better manufacturing processes and fabs than nVIDIA (TMSC was the bottleneck...)... according to a very optimistic statement, nVIDIA and its partners ( TMSC ) are approx. 6 months behind the guys like Intel and IBM as far as manufacturing technology goes... this can be a bit more, but still 6 months is a LOT in the electronics world...

I think they are in better situation than TMSC when they started getting problems on .13 um even if the Broadband Engine should be such a large chip...

I'm sure they are. But that doesn't mean a million things can't go wrong . I get what all of you guys are saying . Except randyman. I think he isn't reading certian words in my post. I'm just trying to say that anything can happen . Intel had problems with the ithuim (sp?) amd had problems bringing the xp - .13. Things don't go as planed.

randycat99 · Mar 7, 2003

I think where we aren't "posting" eye-to-eye is whether or not, say a 24 hour sequence rendering takes that long simply out of detail or are there additional factors that need to be accounted for such as resolution, lighting algorithms, texture resolutions, etc.? Surely, it is a bit of all of those and more. Once you scale down the target to a realtime videogame presentation (and possibly account for 1 TFLOP of performance vs. whatever a certain renderfarm was capable of), I'm sure you won't need hours of rendering time anymore. Quite possibly it might get to very close to a realtime framerate. Do a few shortcuts (with arguably imperceptible impacts to video quality), and you will be in the "window".

I realize all of this is conjecture, anyway. It just sounded like you were dismissing that truly great things weren't possible with the given conditions. My intent was just to give some levity that you really can't say that for sure, and some things may be more plausible than you think.

BenSkywalker · Mar 7, 2003

If you have a pool of computing resources (a methodology the PC indusry is moving to), why can't shader programs (eg. Vertex to start) be run on them. What prevent me from drawing 2 polygons for every pixel on a CPU and then running shaders on them before handing them off to a highly clocked, but simpler, rasterizer which then kick it out to the framebuffer?

Numerous elements, some I have a direct response to below, but for obvious ones, reading back data from the FB and having to reprocess/re rasterize.

Or, what prevents me from running a more Brazil like system (as opposed to PRman's REYES) and run a ray-tracing routine that's divided between APUs which each find independent rays on a Cell-like MPU? Do this at a comperable speed in a fragment/vertex shader (which should be combined by DX10 IIRC).

Memory costs would kill you.

Nobody is talking about software rasteization. In case you don't get it and untill someone proves or explains diffrent - A design like Cell isn't that much diffrent than a P10 or other advanced architecture.

The P10 is quite slow compared to its contemporaries, not to mention expensive and not much more flexible(at least compared to the NV30).

Faf-

Comparing physical shader speed (none of the sampling and filtering crap)...

Of course you must include sampling and filtering. I'm talking about what is possible with 1TFLOP of general purpose CPU power. Let's compare any CPU running some PS2.0 w/16xAF+Trilinear to a GPU using FP32. My entire point in using the comparison is to point out that 1TFLOP isn't that much.

Panajev-

why would be a DX10 GPU faster at executing very long shaders with conditional branches than a modern well designed CPU ?

By executing all branches at the same time and discarding those that aren't needed. An enormous waste of transistors on a CPU, a good choice on a GPU.

My main issues with your argument

My argument? The discussion thread we are in asked about what 1TFLOPs would mean for graphics. I have stated repeatedly that 1TFLOPs of general purpose processor won't do much against dedicated hardware, but Sony will almost certainly have a decent dedicated rasterizer.

BTW, You say T&L will not be a factor in next generation... well I disagree, as others did... fully dynamic and global lighting models ( introducing things like radiosity or ray-tracing ) will be a huge tax on any next-generation system and you need RAW horse power to do those... and IMHO the end result would be pretty good looking

Radiosity? Not going to happen next gen, 1PFLOP isn't enough for proper radiosity in real time. Ray tracing, if current plans for utilizing HOS are true you would have to tesselate, write to a RAM buffer, then run the ray calcs and then start rasterizing.

Randy-

That would be informally equivalent to a renderfarm containing over one thousand 400 Mhz PowerMac G4's. Those aren't exactly pokey machines just using one of them.

http://206.166.224.228/Asp2/RenderCalc.asp?WCI=Results

Here is a place where you can rent yourself some TFLOPs computer time, the link is for a calculator comparing how long it would take on a desktop PC to how long it would take on their render farm(2-4TFLOPs, 250 rigs running either 2GHZ chips or 2x1GHZ). Take a constant amount of time to render out a give frame, say 30 minutes for each processor and render out 300 frames worth(ten seconds for a typical 30FPS feed) and see how long their multi TFLOP machine would take. If we use what takes a 400MHZ Mac a total of 6 days six hours the render farm pulls it off in 15minutes 36 seconds. Sounds impressive for raw computing power, but, more up to date machines using the same settings(this is under Maya)-

Dual GHZ Mac- 1 hour 13minutes 7 seconds

2GHZ Athlon- 1 hour 18minutes 45 seconds

3GHZ P4- 1 hour 41minutes 14 seconds

Assume that a scene with that level of complexity is too much for next gen(which if we wanted radiosity it would be way too low), how about one that a dual G4 can render out in two minutes? That would take 10 hours on the Mac, and only 4minutes 52 seconds on the multi TFLOP render farm. That means that multi TFLOP render farm can push out almost one frame per second!!!

Of course, a scene that takes a mere two minutes to render on a CPU is very simplistic by comparison. The render farm with between twice and four times the power we are talking about needs thrity times more to handle a basic scene that a consumer rig could render out in a couple minutes if we are talking real time.

Has Pixar used anything remotely as extensive as that in the past to make a movie?

A TFLOP is nothing big for a render farm, I would expect Pixar to be closer to PFLOP territory(I'm not sure on that, but I would expect it). Consider the above renderfarm could be built for under $1Million(actually, likely less then half that) compared to Pixar's budgets and the fact that renderfarms are useable for more then one picture.

zidane1strife · Mar 7, 2003

I have stated repeatedly that 1TFLOPs of general purpose processor won't do much against dedicated hardware, but Sony will almost certainly have a decent dedicated rasterizer.

Well, I'm not sure if I've gotten it right, but from what I understand even the non-rasterizer parts are not that general, aren't there a bunch of vectors, and other stuff thrown in there? If so it's more like the vertex processor pools in the GFX only more programmable, and cpuish... but again I know not of this, so I'm not sure.

As for the Petaflops at pixar, I don't think so, Isn't the fastest supercomputer perf less than even 50Tflops, pixar can't have a renderfarm that surpasses it by nearing a Pflop. I mean using cells, assuming they have the low 1tflop perf, it would take a 1000 of those, and if they're equal 100p4 3Ghzs, it would take near 100,000 pentiums to achieve perf near a petaflops.

Anyways, it is my belief that at the end it will probably exceed 1Tflops.

Gubbi · Mar 7, 2003

Google is your friend:

http://www.intel.com/pressroom/archive/releases/20030210comp_a.htm

1024 * 2.8 GHz * 4FLOP/cycle ~ 11.5 TFlops.

Cheers
Gubbi

zidane1strife · Mar 7, 2003

Wow so the 3Ghz xeons babies are quite powerful... a 1024 of them can actually get nearer to what I expect the actual perf to be... hmmm interesting...

edited

BenSkywalker · Mar 7, 2003

1024 * 2.8 GHz * 4FLOP/cycle ~ 11.5 TFlops

Looking at peak FLOPS ratings you should use the theoretical 8 FLOPS per clock of the Xeon, not the 4 of the P3. Still a lot less then I expected, only ~23TFLOPS. Surprising to me that they don't drop a bit more cash on renderfarms.

I mean using cells, assuming they have the low 1tflop perf, it would take a 1000 of those, and if they're equal 100p4 3Ghzs, it would take near 100,000 pentiums to achieve perf near a petaflops.

Hmm, 100,000 P4s @3GHZ would be 2.4PFLOPS, that could likely do very nicely for real time software rendering

If PS3 can really do 1Tflops

Similar threads